Systems methods for adjusting targeted bit allocation based on an occupancy level of a VBV buffer model

ABSTRACT

The invention is related to methods and apparatus that advantageously improve bit rate control in a video encoder, such as an MPEG video encoder. One embodiment of the invention advantageously varies the targeted bit allocation for a picture to be encoded based on an occupancy level of a buffer model, such as a video buffer verifier (VBV) buffer model.

RELATED APPLICATION

This application is a Continuation patent application of co-pendingapplication Ser. No. 10/452,769, filed on 30 May 2003.

APPENDIX A

Appendix A, which forms a part of this disclosure, is a list of commonlyowned copending U.S. patent applications. Each one of the applicationslisted in Appendix A is hereby incorporated herein in its entirety byreference thereto.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention generally relates to video encoding techniques. Inparticular, the invention relates to adjusting a target number of bitsto encode a picture based in part to achieve bit rate control andmaintain buffer occupancy.

2. Description of the Related Art

A variety of digital video compression techniques have arisen totransmit or to store a video signal with a lower data rate or with lessstorage space. Such video compression techniques include internationalstandards, such as H.261, H.263, H.263+, H.263++, H.264, MPEG-1, MPEG-2,MPEG-4, and MPEG-7. These compression techniques achieve relatively highcompression ratios by discrete cosine transform (DCT) techniques andmotion compensation (MC) techniques, among others. Such videocompression techniques permit video data streams to be efficientlycarried across a variety of digital networks, such as wireless cellulartelephony networks, computer networks, cable networks, via satellite,and the like, and to be efficiently stored on storage mediums such ashard disks, optical disks, Video Compact Discs (VCDs), digital videodiscs (DVDs), and the like. The encoded data streams are decoded by avideo decoder that is compatible with the syntax of the encoded datastream.

For relatively high image quality, video encoding can consume arelatively large amount of data. However, the communication networksthat carry the video data can limit the data rate that is available forencoding. For example, a data channel in a direct broadcast satellite(DBS) system or a data channel in a digital cable television networktypically carries data at a relatively constant bit rate (CBR) for aprogramming channel. In addition, a storage medium, such as the storagecapacity of a disk, can also place a constraint on the number of bitsavailable to encode images.

As a result, a video encoding process often trades off image qualityagainst the number of bits used to compress the images. Moreover, videoencoding can be relatively complex. For example, where implemented insoftware, the video encoding process can consume relatively many CPUcycles. Further, the time constraints applied to an encoding processwhen video is encoded in real time can limit the complexity with whichencoding is performed, thereby limiting the picture quality that can beattained.

One conventional method for rate control and quantization control for anencoding process is described in Chapter 10 of Test Model 5 (TM5) fromthe MPEG Software Simulation Group (MSSG). TM5 suffers from a number ofshortcomings. An example of such a shortcoming is that TM5 does notguarantee compliance with Video Buffer Verifier (VBV) requirement. As aresult, overrunning and underrunning of a decoder buffer can occur,which undesirably results in the freezing of a sequence of pictures andthe loss of data.

SUMMARY OF THE INVENTION

The invention is related to methods and apparatus that advantageouslyimprove bit rate control in a video encoder, such as an MPEG videoencoder. One embodiment of the invention advantageously varies thetargeted bit allocation for a picture to be encoded based on anoccupancy level of a buffer model, such as a video buffer verifier (VBV)buffer model.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of the invention will now be described withreference to the drawings summarized below. These drawings and theassociated description are provided to illustrate preferred embodimentsof the invention and are not intended to limit the scope of theinvention.

FIG. 1 illustrates an example of a sequence of pictures.

FIG. 2 illustrates an example of an encoding environment in which anembodiment of the invention can be used.

FIG. 3 illustrates an example of decoding environments, which caninclude a decoder buffer.

FIG. 4 is a block diagram that generally illustrates the relationshipbetween an encoder, a decoder, data buffers, and a constant-bit-ratedata channel.

FIG. 5 is a chart that generally illustrates buffer occupancy as afunction of time, as data is provided to a buffer at a constant bit ratewhile the data is consumed by the decoder at a variable bit rate.

FIG. 6 consists of FIGS. 6A and 6B and is a flowchart that generallyillustrates rate control and quantization control in a video encoder.

FIG. 7 is a flowchart that generally illustrates a process for adjustinga targeted bit allocation based at least in part on an occupancy levelof a virtual buffer.

FIG. 8A is a flowchart that generally illustrates a sequence ofprocessing macroblocks according to the prior art.

FIG. 8B is a flowchart that generally illustrates a sequence ofprocessing macroblocks according to one embodiment.

FIG. 9A is a flowchart that generally illustrates a process forstabilizing the encoding process from the deleterious effects of bitstuffing.

FIG. 9B is a flowchart that generally illustrates a process forresetting virtual buffer occupancy levels upon the detection of anirregularity in a final buffer occupancy level.

FIG. 10A illustrates examples of groups of pictures (GOPs).

FIG. 10B is a flowchart that generally illustrates a process forresetting encoding parameters upon the detection of a scene changewithin a group of pictures (GOP).

FIG. 11 is a flowchart that generally illustrates a process for theselective skipping of data in a video encoder to reduce or eliminate theoccurrence of decoder buffer underrun.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Although this invention will be described in terms of certain preferredembodiments, other embodiments that are apparent to those of ordinaryskill in the art, including embodiments that do not provide all of thebenefits and features set forth herein, are also within the scope ofthis invention. Accordingly, the scope of the invention is defined onlyby reference to the appended claims.

FIG. 1 illustrates a sequence of pictures 102. While embodiments of theinvention are described in the context of MPEG-2 and pictures, theprinciples and advantages described herein are also applicable to othervideo standards including H.261, H.263, MPEG-1, and MPEG-4, as well asvideo standards yet to be developed. The term “picture” will be usedherein and encompasses pictures, images, frames, visual object planes(VOPs), and the like. A video sequence includes multiple video imagesusually taken at periodic intervals. The rate at which the pictures offrames are displayed is referred to as the picture rate or frame rate.The pictures in a sequence of pictures can correspond to eitherinterlaced images or to non-interlaced images, i.e., progressive images.In an interlaced image, each image is made of two separate fields, whichare interlaced together to create the image. No such interlacing isperformed in a non-interlaced or progressive image.

The sequence of pictures 102 can correspond to a movie or otherpresentation. It will be understood that the sequence of pictures 102can be of finite duration, such as with a movie, or can be of unboundduration, such as for a media channel in a direct broadcast satellite(DBS) system. An example of a direct broadcast satellite (DBS) system isknown as DIRECTV®. As shown in FIG. 1, the pictures in the sequence ofpictures 102 are grouped into units known as groups of pictures such asthe illustrated first group of pictures 104. A first picture 106 of thefirst group of pictures 104 corresponds to an I-picture. The otherpictures in the group of pictures can correspond to P-pictures or toB-pictures.

In MPEG-2, a picture is further divided into smaller units known asmacroblocks. It will be understood that in other video standards, suchas MPEG-4, a picture can be further divided into other units, such asvisual object planes (VOPs). Returning now to MPEG-2, an I-picture is apicture in which all macroblocks are intra coded, such that an image canbe constructed without data from another picture. A P-picture is apicture in which all the macroblocks are either intra coded or forwardpredictively coded. The macroblocks for a P-picture can be encoded ordecoded based on data for the picture itself, i.e., intra coded, orbased on data from a picture that is earlier in the sequence ofpictures, i.e., forward predictively coded. A B-picture is a picture inwhich the macroblocks can be intra coded, forward predictively coded,backward predictively coded, or a combination of forward and backwardpredictively coded, i.e., interpolated. During an encoding and/or adecoding process for a sequence of pictures, the B-pictures willtypically be encoded and/or decoded after surrounding I-pictures and/orP-pictures are encoded and/or decoded. An advantage of usingpredictively-coded macroblocks over intra-coded macroblocks is that thenumber of bits used to encode predictively-coded macroblocks can bedramatically less than the number of bits used to encode intra-codedmacroblocks.

The macroblocks include sections for storing luminance (brightness)components and sections for storing chrominance (color) components. Itwill be understood by one of ordinary skill in the art that the videodata stream can also include corresponding audio information, which isalso encoded and decoded.

FIG. 2 illustrates an example of an encoding environment in which anembodiment of the invention can be used. A source for unencoded video202 provides the unencoded video as an input to an encoder 204. Thesource for unencoded video 202 can be embodied by a vast range ofdevices, such as, but not limited to, video cameras, sampled video tape,sampled films, computer-generated sources, and the like. The source forunencoded video 202 can even include a decoder that decodes encodedvideo data. The source for unencoded video 202 can be external to theencoder 204 or can be incorporated in the same hardware as the encoder204. In another example, the source for unencoded video 202 is areceiver for analog broadcast TV signals that samples the analog imagesfor storage in a digital video recorder, such as a set-top box known asTiVo®.

The encoder 204 can also be embodied in a variety of forms. For example,the encoder 204 can be embodied by dedicated hardware, such as in anapplication specific integrated circuit (ASIC), by software executing indedicated hardware, or by software executing in a general-purposecomputer. The software can include instructions that are embodied in atangible medium, such as a hard disk or optical disk. In addition, theencoder 204 can be used with other encoders to provide multiple encodedchannels for use in direct broadcast satellite (DBS) systems, digitalcable networks, and the like. For example, the encoded output of theencoder 204 is provided as an input to a server 206 together with theencoded outputs of other encoders as illustrated in FIG. 2. The server206 can be used to store the encoded sequence in mass storage 208, inoptical disks such as a DVD 210 for DVD authoring applications, Video CD(VCD), and the like. The server 206 can also provide the data from theencoded sequence to a decoder via an uplink 212 to a satellite 214 for adirect broadcast satellite (DBS) system, to the Internet 216 forstreaming of the encoded sequence to remote users, and the like. It willbe understood that an encoded sequence can be distributed in a varietyof other mediums including local area networks (LANs), other types ofwide area networks (WANs), wireless networks, terrestrial digitalbroadcasts of television signals, cellular telephone networks, dial-upnetworks, peer-to-peer networks, and the like. In one embodiment, theencoder 204 encodes the sequence of pictures in real time. In anotherembodiment, the encoder 204 encodes the sequence of picturesasynchronously. Other environments in which the encoder 204 can beincorporated include digital video recorders, digital video cameras,dedicated hardware video encoders and the like.

FIG. 3 illustrates an example of decoding environments, which includedecoder buffers that are modeled during the encoding process by a VideoBuffer Verifier (VBV) buffer. An encoded sequence of pictures can bedecoded and viewed in a wide variety of environments. Such environmentsinclude reception of direct broadcast satellite (DBS) signals viasatellite dishes 302 and set top boxes, playback by digital videorecorders, playback through a DVD player 304, reception of terrestrialdigital broadcasts, and the like. For example, a television set 306 canbe used to view the images, but it will be understood that a variety ofdisplay devices can be used.

For example, a personal computer 308, a laptop computer 310, a cellphone 312, and the like can also be used to view the encoded images. Inone embodiment, these devices are configured to receive the video imagesvia the Internet 216. The Internet 216 can be accessed via a variety ofnetworks, such as wired networks and wireless networks.

FIG. 4 is a block diagram that generally illustrates the relationshipbetween an encoder 402, an encoder buffer 404, a decoder 406, a decoderbuffer 408, and a constant-bit-rate data channel 410. In anotherembodiment, the bit rate of the constant-bit-rate data channel can varyslightly from channel-to-channel depending on a dynamic allocation ofdata rates among multiplexed data channels. For the purposes of thisapplication, this nearly constant bit rate with a slight variation indata rate that can occur as a result of a dynamic allocation of datarate among multiplexed data channels will be considered as a constantbit rate. For example, the encoder 402 can correspond to an encoder fora programming channel in a direct broadcast satellite (DBS) system, andthe decoder 406 can correspond to a decoder in a set-top box thatreceives direct broadcast satellite (DBS) signals. The skilledpractitioner will appreciate that the data rate of the constant-bit-ratedata channel 410 for actual video data may be less than the data rate ofthe constant-bit-rate data channel 410 itself because some of the actualtransmission data may be occupied for overhead purposes, such as forerror correction and for packaging of data. The skilled practitionerwill appreciate that the methods described herein are directlyapplicable to constant-bit-rate encoding, as described in the MPEGstandard document, but also to variable-bit-rate encoding. For the caseof variable bit-rate, the transmission bit rate can be described interms of a long-term average over a time period that can be a fewseconds, a few minutes, a few hours, or any other suitabletime-interval, together with a maximal bit rate that can be used toprovide data to a decoder buffer. Data can be provided from the channelto the decoder buffer at the maximal bit rate until the decoder bufferis full; at that point, the data channel waits for decoding of the nextpicture, which will remove some data from the decoder buffer, and thentransfer of data from the channel to the decoder buffer resumes. Theterm “bit rate” used hereafter can be either some constant bit rate or along-term average of variable bit rate encoding. In one embodiment of aconstant bit rate encoder, the encoder produces a data stream with arelatively constant bit rate over a group of pictures.

For streaming applications such as a direct broadcast satellite (DBS)system or for recording of live broadcasts such as in a home digitalvideo recorder, the encoder 402 receives and encodes the video images inreal time. The output of the encoder 402 can correspond to a variablebit rate (VBR) output 412. The variable bit rate (VBR) output 412 of theencoder 402 is temporarily stored in the encoder buffer 404. A functionof the encoder buffer 404 and the decoder buffer 408 is to hold datatemporarily such that data can be stored and retrieved at different datarates. It should be noted that the encoder buffer 404 and the decoderbuffer 408 do not need to be matched, and that the encoder buffer 404 isa different buffer than a video buffer verifier (VBV) buffer, which isused by the encoder 402 to model the occupancy of the decoder buffer 408during the encoding process.

The encoder buffer 404 can be implemented in dedicated memory or can beefficiently implemented by sharing system memory, such as the existingsystem memory of a personal computer. Where the memory used for theencoder buffer 404 is shared, the encoder buffer 404 can be termed a“virtual buffer.” It will be understood that larger memories, such asmass storage, can also be used to store video data streams and portionsthereof.

The encoder buffer 404 buffers the relatively short-term fluctuations ofthe variable bit rate (VBR) output 412 of the encoder 402 such that theencoded data can be provided to the decoder 406 via theconstant-bit-rate data channel 410. Similarly, the decoder buffer 408can be used to receive the encoded data at the relatively constant bitrate of the constant-bit-rate data channel 410 and provide the encodeddata to the decoder 406 as needed, which can be at a variable bit rate.The decoder buffer 408 can also be implemented in dedicated memory or ina shared memory, such as the system memory of a personal computer. Whereimplemented in a shared memory, the decoder buffer 408 can alsocorrespond to a virtual buffer.

The MPEG standards specify a size for the decoder buffer 408. The sizeof the decoder buffer 408 is specified such that an MPEG-compliant datastream can be reliably decoded by a standard decoder. In the MPEG-2standard, which for example is used in the encoding of a DVD, the buffersize specified is about 224 kB. In the MPEG-1 standard, which forexample is used in the encoding of a video compact disc (VCD), thebuffer size is specified to be about 40 kB. It will be understood by oneof ordinary skill in the art that the actual size of the encoder buffer404 and/or the decoder buffer 408 can be determined by a hardwaredesigner or by a software developer by varying from the standard.

Although it will be understood that the actual size of the decoderbuffer 408 can vary from standard, there exist practical limitationsthat affect the size and occupancy of the decoder buffer 408. When thesize of the decoder buffer 408 is increased, this can correspondinglyincrease the delay encountered when a sequence is selected and playbackis initiated. For example, when a user changes the channel of a directbroadcast satellite (DBS) set-top box or skips forwards or backwardswhile viewing a DVD, the retrieved data is stored in the decoder buffer408 before it is retrieved by the decoder 406 for playback. When thedecoder buffer 408 is of a relatively large size, this can result in aninfuriatingly long delay between selection of a sequence and playback ofthe sequence. Moreover, as will be described later in connection withFIG. 5, the encoded data can specify when playback is to commence, suchthat playback can begin before the decoder buffer 408 is completely fullof data.

In one embodiment, playback of a sequence begins upon the earlier of twoconditions. A first condition is a time specified by the MPEG datastream. A parameter that is carried in the MPEG data stream known asvbv-delay provides an indication of the length of time that data for asequence should be buffered in the decoder buffer 408 before theinitiation of playback by the decoder 406. The vbv-delay parametercorresponds to a 16-bit number that ranges from 0 to 65,535. The valuefor the vbv-delay parameter is counted down by the decoder 406 by a 90kHz clock signal such that the amount of time delay specified by thevbv-delay parameter corresponds to the value divided by 90,000. Forexample, the maximum value for the vbv-delay of 65,535 therebycorresponds to a time delay of about 728 milliseconds (mS). It will beunderstood that the vbv-delay can initiate playback of the sequence at atime other than when the decoder buffer 408 is full so that even if thedecoder buffer 408 is relatively large, the occupancy of the decoderbuffer 408 can be relatively low.

A second condition corresponds to the filling of the decoder buffer 408.It will be understood that if data continues to be provided to thedecoder buffer 408 after the decoder buffer 408 has filled and has notbeen emptied, that some of the data stored in the decoder buffer 408will typically be lost. To prevent the loss of data, the decoder 406 caninitiate playback at a time earlier than the time specified by thevbv-delay parameter. For example, when the size of the decoder buffer408 corresponds to the specified 224 kB buffer size, bit-rates thatexceed 2.52 Mega bits per second (Mbps) can fill the decoder buffer 408in less time than the maximum time delay specified by the vbv-delayparameter.

The concept of the VBV buffer in the MPEG specification is intended toconstrain the MPEG data stream such that decoding of the data streamdoes not result in an underrun or an overrun of the decoder buffer 408.It will be understood that the VBV buffer model does not have to be anactual buffer and does not actually have to store data. However, despitethe existence of the VBV buffer concept, the video encoding techniquestaught in MPEG's Test Model 5 (TM5) do not guarantee VBV compliance, andbuffer underrun and overrun can occur.

Buffer underrun of the decoder buffer 408 occurs when the decoder buffer408 runs out of data. This can occur when the bit rate of theconstant-bit-rate data channel 410 is less than the bit rate at whichdata is consumed by the decoder 406 for a relatively long period oftime. This occurs when the encoder 402 has used too many bits to encodethe sequence relative to a specified bit rate. A visible artifact ofbuffer underrunning in the decoder buffer 408 is a temporary freeze inthe sequence of pictures.

Buffer overrun of the decoder buffer 408 occurs when the decoder buffer408 receives more data than it can store. This can occur when the bitrate of the constant-bit-rate data channel 410 exceeds the bit rateconsumed by the decoder 406 for a relatively long period of time. Thisoccurs when the encoder 402 has used too few bits to encode the sequencerelative to the specified bit rate. As a result, the decoder buffer 408is unable to store all of the data that is provided from theconstant-bit-rate data channel 410, which can result in a loss of data.This type of buffer overrun can be prevented by “bit stuffing,” which isthe sending of data that is not used by the decoder 406 so that thenumber of bits used by the decoder 406 matches with the number of bitssent by the constant-bit-rate data channel 410 over a relatively longperiod of time. However, bit stuffing can introduce other problems asdescribed in greater detail later in connection with FIGS. 9A and 9B.

The VBV buffer model concept is used by the encoder 402 in an attempt toproduce a video data stream that will preferably not result in bufferunderrun or overrun in the decoder buffer 408. In one embodiment, theoccupancy levels of the VBV buffer model are monitored to produce avideo data stream that does not result in buffer underrun or overrun inthe decoder buffer 408. It should be noted that overrun and underrun inthe encoder buffer 404 and in the decoder buffer 408 are not the same.For example, the conditions that result in a buffer underrun in thedecoder buffer 408, i.e., an encoded bit rate that exceeds the bit rateof the constant-bit-rate data channel 410 for a sustained period oftime, can also result in buffer overrun in the encoder buffer 404.Further, the conditions that result in a buffer overrun in the decoderbuffer 408, i.e., an encoded bit rate that is surpassed by the bit rateof the constant-bit-rate data channel 410 for a sustained period oftime, can also result in a buffer underrun in the encoder buffer 404.

FIG. 5 is a chart that generally illustrates decoder buffer occupancy asdata is provided to a decoder buffer at a constant bit rate while datais consumed by a decoder at a variable bit rate. In a conventionalsystem based on MPEG TM5, the data stream provided to the decoderdisadvantageously does not guarantee that the decoder buffer isprevented from buffer underrun or overrun conditions. In the illustratedexample, the data is provided to the decoder buffer at a constant bitrate and the decoder uses the data to display the video in real time.

Time (t) 502 is indicated along a horizontal axis. Increasing time isindicated towards the right. Decoder buffer occupancy 504 is indicatedalong a vertical axis. In the beginning, the decoder buffer is empty. Amaximum level for the buffer is represented by a B_(MAX) 528 level. Anencoder desirably produces a data stream that maintains the data in thebuffer below the B_(MAX) 528 level and above an empty level. Forexample, the decoder buffer can be flushed in response to a skip withina program, in response to changing the selected channel in a directbroadcast satellite (DBS) system or in a digital cable televisionnetwork, and the like. The decoder monitors the received data for asystem clock reference (SCR), as indicated by SCR(0) 506. The systemclock reference (SCR) is a time stamp for a reference clock that isembedded into the bit stream by the encoder and is used by the decoderto synchronize time with the time stamps for video information that arealso embedded in the bit stream. The time stamps indicate when videoinformation should be decoded, indicate when the video should bedisplayed, and also permit the synchronization of visual and audiosamples.

An example of a picture type pattern that is commonly used in real-timevideo encoding is a presentation order with a repeating pattern ofIBBPBBPBBPBBPBB. Despite the fact that I-pictures consume relativelylarge amounts of data, the periodic use of I-pictures is helpful forexample, to permit a picture to be displayed in a relatively shortperiod of time after a channel change in a DBS system.

The picture presentation or display order can vary from the pictureencoding and decoding order. B-pictures depend on surrounding I- orP-pictures and not from other B-pictures, so that I- or P-picturesoccurring after a B-picture in a presentation order will often beencoded, transmitted, and decoded prior to the encoding, transmitting,and decoding of the B-picture. For example, the relatively small portionof the sequence illustrated in FIG. 5 includes data for pictures in theorder of IPBBP, as a P-picture from which the B-pictures depend istypically encoded and decoded prior to the encoding and decoding of theB-pictures, even though the pictures may be displayed in an order ofIBBPBBPBBPBBPBB. It will be understood that audio data in the videopresentation will typically not be ordered out of sequence. Table Isummarizes the activity of the decoder with respect to time. Forclarity, the illustrated GOP will be described as having only the IPBBPpictures and it will be understood that GOPs will typically include morethan the five pictures described in connection with FIG. 5.

TABLE I time activity < T₀  data accumulates in the buffer T₀ I-pictureis decoded T₁ I-picture is presented, first P-picture is decoded T₂first B-picture is decoded and presented T₃ second B-picture is decodedand presented T₄ first P-picture is presented, second P-picture isdecoded

In one embodiment, the decoder buffer ignores data until a pictureheader with a presentation time stamp (PTS) for an I-frame is detected.This time is indicated by a time TTS₀(0) 508 in FIG. 5. This bypassingof data prevents the buffering of data for part of a picture or frame orthe buffering of data that cannot be decoded by itself. After the timeTTS₀(0) 508, the decoder buffer begins to accumulate data as indicatedby the ramp R₀ 510.

For a time period τ₀(0) 512, the decoder buffer accumulates the databefore using the data. This time period τ₀(0) 512 is also known as apre-loading delay. Along the top of FIG. 5 are references for time thatare spaced approximately evenly apart with a picture period equal to theinverse of the frame rate or inverse of the picture rate (1/R_(f)) 514.As will be described later, the location in time for the pictures can beindicated by time stamps for the corresponding pictures. At a time T₀516, the decoder retrieves an amount of data corresponding to the firstpicture of a group of pictures (GOP), which is an I-picture. The datastream specifies the time to decode the I-picture in a decoding timestamp (DTS), which is shown as a time stamp DTS₀(0) 518 and specifiesthe time T₀ 516.

The retrieval of data corresponding to the I-picture is indicated by therelatively sharp decrease 520 in decoder buffer occupancy. For clarity,the extraction of data from the decoder buffer is drawn as occurringinstantaneously, but it will be understood by one of ordinary skill inthe art that a relatively small amount of time can be used to retrievethe data. Typically, I-pictures will consume a relatively large amountof data, P-pictures will consume a relatively smaller amount of data,and B-pictures will consume a relatively small amount of data. However,the skilled practitioner will appreciate that intra macroblocks, whichconsume a relatively large amount of data, can be present in P-picturesand in B-pictures, as well as in I-pictures, such that P-pictures andB-pictures can also consume relatively large amounts of data. TheI-picture that is decoded at the time T₀ 516 is not yet displayed at thetime T₀ 516, as a presentation time stamp PTS₀(1) 522 specifiespresentation at a time T₁ 524.

At the time T₁ 524, the decoder displays the picture corresponding tothe I-picture that was decoded at the time T₀ 516. The time periodPTS_OFFSET 526 illustrates the delay from the start of accumulating datain the decoder buffer for the selected sequence to the presentation ofthe first picture. A decoding time stamp DTS₀(1) 530 instructs thedecoder to decode the first P-picture in the sequence at the time T₁524. The extraction of data from the decoder buffer is illustrated by adecrease 532 in buffer occupancy. In between the time T₀ 516 to the timeT₁ 524, the decoder buffer accumulates additional data as shown by aramp 534. A presentation time stamp PTS₀(4) 536 instructs the decoder todisplay the first P-picture at a time T₄ 538. In this example, the firstP-picture is decoded earlier than it is presented such that theB-pictures, which can include backward predictively, forwardpredictively, or even bi-directionally predictively coded macroblocks,can be decoded.

At a time T₂ 540, the decoder decodes and displays the first B-pictureas specified by a presentation time stamp PTS₀(2) 542. No decoding timestamp (DTS) is present because both the decoding and presenting occur atthe same time period. It will be understood that in actual decoders,there can be a relatively small delay between the decoding and thedisplaying to account for computation time and other latencies. Theamount of data that is typically used by a B-picture is relatively smallas illustrated by a relatively small decrease 550 in decoder bufferoccupancy for the first B-picture. It will be understood, however, thatB-pictures can also include intra macroblocks that can consume arelatively large amount of data.

At a time T₃ 546, the decoder decodes and displays the second B-pictureas specified by a presentation time stamp PTS₀(3) 548.

At the time T₄ 538, the decoder displays the first P-picture that wasoriginally decoded at the time T₁ 524. At the time T₄ 538, the decoderalso decodes a second P-picture as specified by the second P-picture'sdecoding time stamp DTS₀(4) 554. The second P-picture will be presentedat a later time, as specified by a presentation time stamp (not shown).The decoder continues to decode and to present other pictures. Forexample, at a time T₅ 544, the decoder may decode and present a B-frame,depending on what is specified by the data stream.

Rate Control and Quantization Control Process

FIG. 6 is a flowchart that generally illustrates a rate control andquantization control process in a video encoder. It will be appreciatedby the skilled practitioner that the illustrated process can be modifiedin a variety of ways without departing from the spirit and scope of theinvention. For example, in another embodiment, various portions of theillustrated process can be combined, can be rearranged in an alternatesequence, can be removed, and the like. In another embodiment, selectedportions of the illustrated process are replaced with processes from arate control and quantization control process as disclosed in Chapter 10of Test Model 5. The rate at which bits are consumed to encode picturesaffects the occupancy of the decoder buffer during encoding. Asillustrated by brackets in FIG. 6, portions of the process are relatedto bit allocation, to rate control, and to adaptive quantization. Bitallocation relates to estimating the number of bits that should be usedto encode the picture to be encoded. Rate control relates to determiningthe reference quantization parameter Q_(j) that should be used to encodea macroblock. Adaptive quantization relates to analyzing the spatialactivity in the macroblocks in order to modify the referencequantization parameter Q_(j) and calculate the value of the quantizationparameter mquant_(j) that is used to quantize a macroblock.

The process begins at a state 602, where the process receives its firstgroup of pictures. It will be understood that in one embodiment, theprocess may retrieve only a portion of the first group of pictures inthe state 602 and retrieve remaining portions of the first group ofpictures later. In the illustrated process, the pictures are groupedinto groups of pictures before the pictures are processed by the ratecontrol and quantization control process. A group of pictures startswith an I-picture and can include other pictures. Typically, but notnecessarily, the other pictures in the group of pictures are related tothe I-picture. The process advances from the state 602 to a state 604.

In the state 604, the process receives the mode or type of encoding thatis to be applied to the pictures in the group of pictures. In theillustrated rate control and quantization control process, the decisionas to which mode or type of encoding is to be used for each picture inthe group of pictures is made before the pictures are processed by therate control and quantization control process. For example, the group ofpictures described earlier in connection with FIG. 5 have types IPBBP.The process advances from the state 604 to a state 606.

In the state 606, the process determines the number of P-pictures N_(p)and the number of B-pictures N_(b) in the group of pictures to beencoded. For example, in the group of pictures with types IPBBP, thereare two P-pictures and there are two B-pictures to be encoded such thata value for N_(p) is 2 and a value for N_(b) is also 2. There is no needto track the number of I-pictures remaining, as the only I-picture in agroup of pictures is the first picture. The process advances from thestate 606 to a state 608.

In the state 608, the process initializes values for complexityestimators X_(i), X_(p), and X_(b) and for the remaining number of bitsR allocated to the group of pictures that is to be encoded. In oneembodiment, the process initializes the values for the complexityestimators X_(i), X_(p), and X_(b) according to Equations 1-3.

$\begin{matrix}{X_{i} = \frac{160 \cdot {bit\_ rate}}{115}} & ( {{Eq}.\mspace{14mu} 1} ) \\{X_{p} = \frac{60 \cdot {bit\_ rate}}{115}} & ( {{Eq}.\mspace{14mu} 2} ) \\{X_{b} = \frac{42 \cdot {bit\_ rate}}{115}} & ( {{Eq}.\mspace{14mu} 3} )\end{matrix}$

In Equations 1-3, the variable bit_rate corresponds to the relativelyconstant bit rate (in bits per second) of the data channel, such as theconstant-bit-rate data channel 410 described earlier in connection withFIG. 4. In another embodiment, bit_rate corresponds to the average ordesired average bit rate of a variable bit rate channel. In yet anotherembodiment, bit_rate corresponds to a piece-wise constant bit rate valueof a variable bit rate channel.

In one embodiment, the initial value R₀ for the remaining number of bitsR at the start of the sequence, i.e., the initial value of R beforeencoding of the first group of pictures, is expressed in Equation 4 asR₀. At the start of the sequence, there is no previous group of picturesand as a result, there is no carryover in the remaining number of bitsfrom a previous group of pictures. Further updates to the value for theremaining number of bits R will be described later in connection withEquations 27 and 28.

$\begin{matrix}{R_{0} = G} & ( {{Eq}.\mspace{14mu} 4} ) \\{G = \frac{{bit\_ rate} \cdot N}{picture\_ rate}} & ( {{Eq}.\mspace{14mu} 5} )\end{matrix}$

The variable G represents the number of bits that can be transferred bythe data channel in an amount of time corresponding to the length of thepresentation time for the group of pictures. This amount of time varieswith the number of pictures in the group of pictures. In Equation 5, thevariable bit_rate is in bits per second, the value of N corresponds tothe number of pictures in the group of pictures (of all types), and thevariable picture_rate is in pictures or frames per second. The processthen advances from the state 608 to a state 610.

In the state 610, the process calculates an initial target number ofbits T_(i), T_(p), or T_(b), i.e., an initial target bit allocation, forthe picture that is to be encoded. It should be noted that the picturesin a group of pictures will typically be encoded out of sequence whenB-pictures are encoded. In one embodiment, the rate control andquantization control process calculates the initial target bitallocation for the picture according to the equation from Equations 6-8for the corresponding picture type that is to be encoded.

$\begin{matrix}{T_{i} = {\max \{ {( \frac{R}{( {1 + \frac{N_{p}X_{p}}{X_{i}K_{p}} + \frac{N_{b}X_{b}}{X_{i}K_{b}}} )} ),( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )} \}}} & ( {{Eq}.\mspace{14mu} 6} ) \\{T_{p} = {\max \{ {( \frac{R}{( {N_{p} + \frac{N_{b}K_{p}X_{b}}{K_{b}K_{p}}} )} ),( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )} \}}} & ( {{Eq}.\mspace{14mu} 7} ) \\{T_{b} = {\max \{ {\frac{R}{( {N_{b} + \frac{N_{p}K_{b}X_{p}}{K_{p}X_{b}}} )},\frac{bit\_ rate}{8 \cdot {picture\_ rate}}} \}}} & ( {{Eq}.\mspace{14mu} 8} )\end{matrix}$

In Equation 6, T₁ corresponds to the target bit allocation for the nextpicture to be encoded when the picture is the I-picture that starts agroup of pictures, and T₁ is determined by the higher of the twoexpressions in the brackets. In Equation 7, T_(p) corresponds to thetarget bit allocation for the next picture to be encoded when the nextpicture is a P-picture. In Equation 8, T_(b) corresponds to the targetbit allocation for the picture when the picture is a B-picture. Thevalues of the “universal constants” K_(p) and K_(b) depend on thequantization matrices that are used to encode the pictures. It will beunderstood that the values for K_(p) and K_(b) can vary. In oneembodiment, the values for K_(p) and K_(b) are 1.0 and 1.4,respectively. In another embodiment, the value of these constants can bechanged according to the characteristics of the encoded pictures, suchas amount and type of motion, texture, color and image detail.

In one embodiment of the rate control and quantization control process,the process further adjusts the target bit allocation T_((i,p,b)) fromthe initial target bit allocation depending on the projected bufferoccupancy of the decoder buffer as will be described in greater detaillater in connection with FIG. 7.

When the process has determined the target bit allocation for the nextpicture to be encoded, the process advances from the state 610 to astate 612. Also, the bits allocated to a picture are further allocatedamong the macroblocks of the picture. This macroblock bit allocation canbe calculated by conventional techniques, such as techniques describedin TM5, or by the techniques described herein in greater detail later inconnection with a state 614. In addition, various orders or sequences inwhich a picture can advantageously be processed when encoded intomacroblocks will be described in greater detail later in connection withFIGS. 8A and 8B.

In the state 612, the process sets initial values for virtual bufferfullness. In one embodiment, there is a virtual buffer for each picturetype. The variables d_(j) ^(i), d_(j) ^(p), and d _(j) ^(b) representthe virtual buffer fullness for I-pictures, for P-pictures, and forB-pictures, respectively. The variable j represents the number of themacroblock that is being encoded and starts at a value of 1. A value of0 for j represents the initial condition. The virtual buffer fullness,i.e., the values of d_(j) ^(i), d_(j) ^(p), and d _(j) ^(b), correspondto the virtual buffer fullness prior to encoding the j-th macroblocksuch that the virtual buffer fullness corresponds to the fullness atmacroblock (j−1).

$\begin{matrix}{d_{0}^{i} = {10 \cdot \frac{r}{31}}} & ( {{Eq}.\mspace{14mu} 9} ) \\{d_{0}^{p} = {K_{p} \cdot d_{0}^{i}}} & ( {{Eq}.\mspace{14mu} 10} ) \\{d_{0}^{b} = {K_{b} \cdot d_{0}^{i}}} & ( {{Eq}.\mspace{14mu} 11} )\end{matrix}$

One example of a computation for the value of the reaction parameter rthat appears in Equation 9 is expressed by Equation 12. It will beunderstood by one of ordinary skill in the art that other formulas forthe calculation of the reaction parameter r can also be used.

$\begin{matrix}{r = {2 \cdot \frac{bit\_ rate}{picture\_ rate}}} & ( {{Eq}.\mspace{14mu} 12} )\end{matrix}$

With respect to Equations 10 and 11, K_(p) and K_(b) correspond to the“universal constants” described earlier in connection with Equations6-8. The process can advance from the state 612 to the state 614 or canskip to a state 616 as will be described in connection with the state614.

In the state 614, the process updates the calculations for virtualbuffer fullness, i.e., the value for d_(j) ^(i), d_(j) ^(p), or d _(j)^(b). The value d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b) that is updateddepends on the picture type, e.g., the d_(j) ^(i) value is updated whenan I-picture is encoded. The process updates the calculations for thevirtual buffer fullness to account for the bits used to encode themacroblock. The update to the virtual buffer fullness should correspondto the technique used to allocate the bits among the macroblocks of apicture. For example, where TM5 is followed, the allocation of bitswithin the macroblocks of a picture can be approximately linear, i.e.,constant. In one embodiment, the bits are also advantageously allocatedamong macroblocks based on the relative motion of a macroblock within apicture (for P-pictures and B-pictures), rather than an estimate of therelative motion.

Equations 13a, 14a, and 15a generically describe the update to thecalculations for virtual buffer fullness.

d _(j) ^(i) =d ₀ ^(i) +B _(j-1) −TMB _(j-1) ^(i)  (Eq. 13a)

d _(j) ^(p) =d ₀ ^(p) +B _(j-1) −TMB _(j-1) ^(p)  (Eq. 14a)

d _(j) ^(b) =d ₀ ^(b) +B _(j-1) −TMB _(j-1) ^(b)  (Eq. 15a)

The variable B_(j) corresponds to the number of bits that have alreadybeen used to encode the macroblocks in the picture that is beingencoded, including the bits used in macroblock j such that the variableB_(j-1) corresponds to the number of bits that have been used to encodethe macroblocks up to but not including the j-th macroblock. Thevariables TMB_(j-1) ^(i), TMB_(j-1) ^(p), and TMB_(j-1) ^(b), correspondto the bits allocated to encode the macroblocks up to but not includingthe j-th macroblock.

Equations 13b, 14b, and 15b express calculations for virtual bufferfullness, i.e., values for d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b), asused in the process described by TM5. Disadvantageously, the TM5 processallocates bits within a picture without regard to motion of macroblockssuch that macroblocks that should have bits allocated variably toaccommodate rapid motion, such as the macroblocks that encode themovement of an athlete, have the same bits allocated as macroblocks thatare relatively easy to encode.

$\begin{matrix}{d_{j}^{i} = {d_{0}^{i} + B_{j - 1} - ( \frac{T_{i} \cdot ( {j - 1} )}{MB\_ cnt} )}} & ( {{{Eq}.\mspace{14mu} 13}b} ) \\{d_{j}^{p} = {d_{0}^{p} + B_{j - 1} - ( \frac{T_{p} \cdot ( {j - 1} )}{MB\_ cnt} )}} & ( {{{Eq}.\mspace{14mu} 14}b} ) \\{d_{j}^{b} = {d_{0}^{b} + B_{j - 1} - ( \frac{T_{b} \cdot ( {j - 1} )}{MB\_ cnt} )}} & ( {{{Eq}.\mspace{14mu} 15}b} )\end{matrix}$

In one embodiment, the updated values are expressed by Equations 13c,14c, and 15c. The use of Equations 13c, 14c, and 15c permit theallocation of bits to macroblocks within a picture to be advantageouslyallocated based on the motion activity of a macroblock within a picture.Advantageously, such allocation can permit the bits of a picture to beallocated to macroblocks based on a computation of the relative motionof the macroblock rather than a constant amount or an estimate of themotion. The variable allocation of bits among the macroblocks of apicture will be described in greater detail later in connection withFIGS. 8A and 8B.

$\begin{matrix}{d_{j}^{i} = {d_{0}^{i} + B_{j - 1} - ( \frac{T_{i} \cdot {Mact\_ sum}_{j - 1}}{MACT} )}} & ( {{{Eq}.\mspace{14mu} 13}c} ) \\{d_{j}^{p} = {d_{0}^{p} + B_{j - 1} - ( \frac{T_{p} \cdot {Mact\_ sum}_{j - 1}}{MACT} )}} & ( {{{Eq}.\mspace{14mu} 14}c} ) \\{d_{j}^{b} = {d_{0}^{b} + B_{j - 1} - ( \frac{T_{b} \cdot {Mact\_ sum}_{j - 1}}{MACT} )}} & ( {{{Eq}.\mspace{14mu} 15}c} )\end{matrix}$

The variable MACT represents the sum of the motion activity of all ofthe macroblocks as expressed in Equation 16. The variable Mact_sum_(j-1)corresponds to the sum of the motion activity of all of the macroblocksin the picture that have been encoded, i.e., the macroblocks up to butnot including macroblock j, as expressed in Equation 17.

$\begin{matrix}{{MACT} = {\sum\limits_{k = 1}^{MB\_ cnt}{Mact}_{k}}} & ( {{Eq}.\mspace{14mu} 16} ) \\{{Mact\_ sum}_{j - 1} = {\sum\limits_{k = 1}^{j - 1}{Mact}_{k}}} & ( {{Eq}.\mspace{14mu} 17} )\end{matrix}$

In Equation 16, the parameter MB_cnt corresponds to the number ofmacroblocks in the picture and the variable Mact_(k) corresponds to themotion activity measure of the luminance of the k-th macroblock. Avariety of techniques can be used to compute the motion activity measuresuch as variance computations and sum of absolute differencecomputations.

In another embodiment, the updated values for the occupancy of thevirtual buffers d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b) are calculatedbased on the corresponding equations for updated virtual bufferoccupancy described in Chapter 10 of the TM5 model from MPEG.

In another embodiment, the updated values for the occupancy of thevirtual buffers d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b) are calculatedbased on Equations 13d, 14d, and 15d.

$\begin{matrix}{d_{j}^{i} = {d_{0}^{i} + B_{j - i} - ( {{\alpha_{i}\frac{T_{i} \cdot ( {j - 1} )}{MB\_ cnt}} + {( {1 - \alpha_{i}} )\frac{T_{i} \cdot {Mact\_ sum}_{j - 1}}{MACT}}} )}} & ( {{{Eq}.\mspace{14mu} 13}d} ) \\{d_{j}^{p} = {d_{0}^{p} + B_{j - i} - ( {{\alpha_{p}\frac{T_{p} \cdot ( {j - 1} )}{MB\_ cnt}} + {( {1 - \alpha_{p}} )\frac{T_{p} \cdot {Mact\_ sum}_{j - 1}}{MACT}}} )}} & ( {{{Eq}.\mspace{14mu} 14}d} ) \\{d_{j}^{b} = {d_{0}^{b} + B_{j - i} - ( {{\alpha_{b}\frac{T_{b} \cdot ( {j - 1} )}{MB\_ cnt}} + {( {1 - \alpha_{b}} )\frac{T_{b} \cdot {Mact\_ sum}_{j - 1}}{MACT}}} )}} & ( {{{Eq}.\mspace{14mu} 15}d} )\end{matrix}$

In Equations 13d, 14d, and 15d, α_(i), α_(p), and α_(b) correspond toweighting factors that can range from about 0 to about 1. Theseweighting factors α_(i), α_(p), and α_(b) permit the allocation of bitsto macroblocks within a picture to be advantageously allocated based ona combination of the relatively equal proportioning from TM5 and theproportioning based on motion activity described earlier in connectionwith Equations 13c, 14c, and 15c. This combined allocation canadvantageously compensate for bits that are relatively evenly allocated,such as bits for overhead. The values for the weighting factors α_(i),α_(p), and α_(b) can vary widely within the range of about 0 to about 1.In one embodiment, the weighting factors α_(i), α_(p), and α_(b) rangefrom about 0 to about 0.5. For example, sample values for theseweighting factors can correspond values such as 0.2, 0.3, 0.4 and 0.5.Other values within the range of about 0 to about 1 will be readilydetermined by one of ordinary skill in the art. One embodiment of thevideo encoder permits a user to configure the values for the weightingfactors α_(i), α_(p), and α_(b).

The values for the occupancy of the virtual buffers d_(j) ^(i), d_(j)^(p), or d _(j) ^(b) are computed for each macroblock in the picture. Itwill be understood, however, that the value for the first macroblock,i.e., d₁ ^(i), d₁ ^(p), or d ₁ ^(b), is the same as the initial valuesset in the state 612 such that the state 614 can be skipped for thefirst macroblock. The process advances from the state 614 to the state616.

In the state 616, the process computes the reference quantizationparameter Q_(j) that is to be used to quantize macroblock j. Equation 18expresses a computation for the reference quantization parameter Q_(j).The process advances from the state 616 to a state 619.

$\begin{matrix}{Q_{j} = ( \frac{d_{j} \cdot 31}{r} )} & ( {{Eq}.\mspace{14mu} 18} )\end{matrix}$

In the state 619, the process computes the normalized spatial activitymeasures N_Sact_(j) for the macroblocks. In one embodiment, the processcomputes the normalized spatial activity measures N_Sact_(j) inaccordance with the TM5 process and Equations 19a, 19b, 21a, 22, and23a. Disadvantageously, the computation of the normalized spatialactivity measures N_Sact_(j) via TM5 allocates bits to macroblockswithin a picture based only on spatial activity (texture) and does nottake motion into consideration. In addition, as will be explained ingreater detail later in connection with Equation 23a, the TM5 processdisadvantageously uses an inappropriate value in the computation of anaverage of the spatial activity measures Savg_act_(j) due to limitationsin the processing sequence, which is explained in greater detail laterin connection with FIGS. 8A and 8B.

In another embodiment, the process computes the normalized spatialactivity measures N_Sact_(j) in accordance with Equations 20a, 21b, 21c,22, and 23b. The combination of the motion activity measure used forcomputation of the reference quantization parameter Q_(j) with themodulation effect achieved through the normalized spatial activitymeasure advantageously permits bits to be allocated within a picture tomacroblocks not only based on spatial activity (texture), but also basedon motion. This can dramatically improve a picture. For example, whenonly spatial activity is used, areas of a picture with rapid motion,such as an area corresponding to an athlete's legs in a sporting event,are typically allocated relatively few bits, which results in visualartifacts such as a “blocky” appearance. This happens because areas ofpictures with rapid motion typically exhibit relatively high spatialactivity (high texture), and are then allocated relatively few bits. Inaddition, as will be described later in connection with Equation 23b,one embodiment further uses the actual values for spatial activitymeasures, which advantageously results in a better match betweentargeted bits and actually encoded bits, thereby decreasing thelikelihood of buffer overrun or buffer underrun.

In the state 619, the activity corresponds to spatial activity withinthe picture to determine the texture of the picture. A variety oftechniques can be used to compute the spatial activity. For example, theprocess can compute the spatial activity in accordance with thetechniques disclosed in Chapter 10 of Test Model 5 or in accordance withnew techniques that are described herein. Equation 19a illustrates acomputation for the spatial activity of a macroblock j from luminanceframe-organized sub-blocks and field-organized sub-blocks as set forthin Chapter 10 of Test Model 5. The intra picture spatial activity of thej-th macroblock, i.e., the texture, can be computed using Equation 19b,which corresponds to the computation that is used in TM5.

$\begin{matrix}{{act}_{j} = {1 + {\min ( {{vblk}_{1},{vblk}_{2},\ldots \mspace{11mu},{vblk}_{8}} )}}} & ( {{{Eq}.\mspace{14mu} 19}a} ) \\{{vblk}_{n} = {\frac{1}{64} \cdot {\sum\limits_{k - 1}^{64}( {P_{k}^{n} - {P\_ mean}_{n}} )^{2}}}} & ( {{{Eq}.\mspace{14mu} 19}b} )\end{matrix}$

A formula for computing the value of P_mean_(n) is expressed later inEquation 21a. The values for P_(k) ^(n) correspond to the sample valuesfrom pixels in the n-th original 8 by 8 sub-block. Disadvantageously,the computation expressed in Equation 19b is relatively complicated andCPU intensive to compute, which can make real-time encoding difficultwith relatively slow general purpose CPUs, such as microprocessors.Equation 19b computes the spatial activity via computation of avariance, which is referred to as L2-norm. This can be a drawback whenvideo encoding is performed in real time and with full resolution andpicture rates. As a result, real time video encoding is typicallyperformed in conventional systems with dedicated hardware. Althoughdedicated hardware video encoders can process video at relatively highspeeds, dedicated hardware is relatively more expensive, lesssupportable, and harder to upgrade than a software solution that can beexecuted by a general-purpose electronic device, such as a personalcomputer. Thus, video encoding techniques that can efficiently processvideo can advantageously permit a general-purpose electronic device toencode video in real time.

Equation 20a illustrates a computation for the spatial activity ofmacroblock j according to one embodiment. Another embodiment uses sumsof absolute differences (instead of sum of squares of differences) asillustrated in Equations 19a and 19b to compute the spatial activity ofmacroblock j. Equation 20b illustrates a computation for the motionactivity of macroblock j according to one embodiment.

$\begin{matrix}{{Sact}_{j} = {\sum\limits_{k = 1}^{256}{{P_{k}^{j} - {P\_ mean}_{j}}}}} & ( {{{Eq}.\mspace{14mu} 20}a} ) \\{{Mact}_{j} = {\sum\limits_{k = 1}^{256}{{P_{k}^{j} - {P\_ mean}_{j}}}}} & ( {{{Eq}.\mspace{14mu} 20}b} )\end{matrix}$

In Equation 20a, the P_(k) ^(j) values correspond to original luminancedata. In Equation 20b, the P_(k) ^(j) values correspond to eitheroriginal luminance data or to motion-compensated luminance datadepending on the type of macroblock. The P_(k) ^(j) values correspond tosample values for the j-th 16 by 16 original luminance data when themacroblock is an intra macroblock. When the macroblock is an intermacroblock, the P_(k) ^(j) values correspond to 16 by 16 motioncompensated luminance data. A formula for computing the value ofP_mean_(j) is expressed later in Equation 21b and 21c.

Moreover, the computations expressed in Equations 20a and 20b canadvantageously permit a general-purpose electronic device to performfull picture rate and relatively high resolution video encoding usingthe described rate control and quantization control process in real timeusing software. It will be understood that the computations expressed inEquations 20a and 20b can also be used in non-real time applications andin dedicated hardware. One embodiment of a video encoding process, whichwas implemented in software and executed by an Intel® Pentium® 4processor with a 3 GHz clock speed, efficiently and advantageouslyencoded a PAL, a SECAM, or an NTSC video data stream with a full picturerate and with full resolution (720×480 pixels) in real time.

The computations expressed in Equations 20a and 20b compute the sum ofabsolute differences (SAD), which is also known as an L1-normcalculation. Although the computation of the SAD can also be relativelycomplex, selected processors or CPUs include a specific instruction thatpermits the computation of the SAD in a relatively efficient manner. Inone embodiment, the general-purpose electronic device corresponds to apersonal computer with a CPU that is compatible with the StreamingSingle Instruction/Multiple Data (SIMD) Extensions (SSE) instruction setfrom Intel Corporation. In another embodiment, the CPU of thegeneral-purpose electronic device is compatible with an instruction thatis the same as or is similar to the “PSADBW” instruction for packed sumof absolute differences (PSAD) of the SSE instruction set. Examples ofCPUs that are compatible with some or all of the SSE instruction setinclude the Intel® Pentium® III processor, the Intel® Pentium® 4processor, the Intel® Xeon™ processor, the Intel® Centrino™ processor,selected versions of the Intel® Celeron® processor, selected versions ofthe AMD Athlon™ processor, selected versions of the AMD Duron™processor, and the AMD Opteron™ processor. It will be understood thatfuture CPUs that are currently in development or have yet to bedeveloped can also be compatible with the SSE instruction set. It willalso be understood that new instruction sets can be included in newprocessors and these new instruction sets can remain compatible with theSSE instruction set.

Equation 21a expresses a calculation for sample values as used inEquation 19b. Equations 21b and 21c express calculations for samplevalues as used in Equations 20a and 20b.

$\begin{matrix}{{P\_ mean}_{n} = {\frac{1}{64} \cdot {\sum\limits_{k - 1}^{64}P_{k}^{n}}}} & ( {{{Eq}.\mspace{14mu} 21}a} ) \\{{P\_ mean}_{j} = {\frac{1}{256} \cdot {\sum\limits_{k = 1}^{256}P_{k}^{j}}}} & ( {{{Eq}.\mspace{14mu} 21}b} ) \\{{P\_ mean}_{j} = 0} & ( {{{Eq}.\mspace{14mu} 21}c} )\end{matrix}$

In one embodiment, the process performs a computation for the average ofthe sample values in the n-th original 8 by 8 sub-block P_mean_(n)according to TM5 as expressed by Equation 21a. In another embodiment,the process computes the computation for the average of sample valuesP_mean_(j) via Equations 21b and 21c. Advantageously, Equations 21b and21c combine spatial activity (texture) computations and motionestimation computations. Equation 21b is used when the macroblockcorresponds to an intra macroblock. Equation 21c is used when themacroblock corresponds to an inter macroblock.

Equation 22 expresses a computation for the normalized spatial activitymeasures N_Sact_(j). The normalized spatial activity measures N_Sact_(j)are used in a state 621 to compute the quantization that is applied tothe discrete cosine transform (DCT) coefficients.

$\begin{matrix}{{N\_ Sact}_{j} = \frac{( {2 \cdot {Sact}_{j}} ) + {Savg\_ act}}{{Sact}_{j} + ( {2 \cdot {Savg\_ act}} )}} & ( {{Eq}.\mspace{14mu} 22} )\end{matrix}$

As expressed in Equation 22, the normalized spatial activity measuresN_Sact_(j) for the j-th macroblock are computed from the spatialactivity measure Sact_(j) for the macroblock and from an average of thespatial activity measures Savg_act. The average of the spatial activitymeasures Savg_act can be computed by Equation 23a or by Equation 23b.

$\begin{matrix}{{Savg\_ act} = {\frac{1}{MB\_ cnt} \cdot {\sum\limits_{j = 1}^{MB\_ cnt}{Sact}_{j}^{previous}}}} & ( {{{Eq}.\mspace{14mu} 23}a} )\end{matrix}$

The computation expressed in Equation 23a represents the computationdescribed in TM5 and uses the spatial activity measures Sact_(j) fromthe previous picture and not from the present picture. As a result,conventional encoders that comply with TM5 compute the normalizedspatial activity measures N_Sact_(j) expressed in Equation 22 relativelyinaccurately. When a value for the average of the spatial activitymeasures Savg_act_(j) is calculated via Equation 23a, the normalizedspatial activity measures N_Sact_(j) represents an estimate fornormalization, rather than an actual calculation for normalization. Theestimate provided in Equation 23a is particularly poor when the scenehas changed from the previous picture to the current picture. As taughtin TM5, a value of 400 can be used to initialize the average of thespatial activity measures Savg_act_(j) for the first picture when theaverage of the spatial activity measures Savg_act_(j) is computed fromthe previous picture.

Encoding via the process described in TM5 uses the previous picture forthe average of the spatial activity measures Savg_act_(j) because theprocessing sequence described in TM5 processes macroblocks one-by-one asthe TM5 process encodes each macroblock, such that a value for theaverage of the spatial activity measures Savg_act_(j) is not availableat the time of the computation and use of the value for the normalizedspatial activity measures N_Sact_(j). Further details of an alternateprocessing sequence will be described in greater detail later inconnection with FIGS. 8A and 8B. The computation expressed in Equation23b represents an improvement over the TM5-based computation expressedin Equation 23a.

$\begin{matrix}{{Savg\_ act} = {\frac{1}{MB\_ cnt} \cdot {\sum\limits_{j = 1}^{MB\_ cnt}{Sact}_{j}^{current}}}} & ( {{{Eq}.\mspace{14mu} 23}b} )\end{matrix}$

In one embodiment, the sequence of processing of macroblocks isadvantageously rearranged as will be described later in connection withFIGS. 8A and 8B. This rearrangement permits the average of the spatialactivity measures Savg_act_(j) to be computed from the spatial activitymeasures Sact_(j) of the macroblocks in the current picture such thatthe value for the normalized spatial activity measures N_Sact_(j) isactually normalized rather than estimated. This advantageously permitsthe data to be relatively predictably quantized such that the amount ofdata used to encode a picture more accurately follows the targetedamount of data. This further advantageously reduces and/or eliminatesirregularities and distortions to the values for the variables d_(j)^(i), d_(j) ^(p), and d_(j) ^(b) that represent the virtual bufferfullness for I-pictures, for P-pictures, and for B-pictures,respectively. In addition, it should be noted that the computation forthe average of the spatial activity measures Savg_act_(j) expressed inEquation 23b does not need to be initialized with an arbitrary value,such as a value of 400, because the actual average is advantageouslycomputed from the spatial activity measures Sact_(j) of the picture thatis currently being encoded. The process advances from the state 619 tothe state 621. Advantageously, this permits calculation of actual motionactivity measures, needed for the calculation of virtual buffer fullnessstatus, as shown in Equations 13-17.

In the state 621, the process computes the quantization parametermquant_(j). The quantization parameter mquant_(j) is used to quantizethe encoded macroblock j. It will be understood that the quantizationparameter mquant_(j) can be used in the state 621 or can be stored andused later. Equation 23 expresses a computation for the quantizationparameter mquant_(j).

mquant _(j) =Q _(j) ·N _(—) Sact _(j)  (Eq. 23)

In Equation 23, Q_(j) corresponds to the reference quantizationparameter described earlier in connection with Equation 18 and N_act_(j)corresponds to the normalized spatial activity measures N_Sact_(j)described earlier in connection with Equation 22. In one embodiment, theprocess further inspects the computed quantization parameter mquant_(j)and limits its value to prevent undesirable clipping of a resultingquantized level QAC_((i,j)). For example, where one embodiment of theprocess is used to encode video according to the MPEG-1 standard, theprocess detects that the calculated value for the quantization parametermquant_(j) corresponds to 2, and automatically substitutes a value of 4.The quantization parameter mquant_(j) is later used in the macroblockencoding process to generate values for the quantized level QAC(i,j).However, in MPEG-1, a value for the quantized level QAC(i,j) is clippedto the range between −255 and 255 to fit within 8 bits. This clipping ofdata can result in visible artifacts, which can advantageously beavoided by limiting the value of a quantization parameter mquant_(j) toa value that prevents the clipping of the resulting quantized level,thereby advantageously improving picture quality.

In one embodiment, the process can further reset values for occupancy ofvirtual buffers (d_(j) ^(i), d_(j) ^(p), and d_(j) ^(b)) and for thequantization parameter mquant_(j) in response to selected stimuli aswill be described in greater detail later in connection with FIG. 9A.The process advances from the state 621 to a state 623.

In the state 623, the process encodes the j-th macroblock. The processencodes the j-th macroblock using the quantization parameter mquant_(j)computed earlier in the state 616. The encoding techniques can include,for example, the computation of discrete cosine transforms, motionvectors, and the like. In one embodiment, the process can selectivelyskip the encoding of macroblocks in B-pictures as will be described ingreater detail later in connection with FIG. 11. The process advancesfrom advances from the state 623 to a decision block 625.

In the decision block 625, the process determines whether all themacroblocks in the picture have been processed by encoding in the state616 or by skipping as will be described in connection with FIG. 11. Theprocess proceeds from the decision block 625 to a state 627 when theprocess has completed the encoding or skipping processing of themacroblocks in the picture. Otherwise, the process returns from thedecision block 625 to the state 614 to continue to process the nextmacroblock.

In the state 627, the process stores the final occupancy value of thevirtual buffers as an initial condition for encoding of the next pictureof the same type. For example, the final occupancy value for therelevant virtual buffer of the present frame, i.e., the value for d_(j)^(i), d_(j) ^(p), or d_(j) ^(b), when j is equal to MB_cnt, is saved sothat it can be used as a starting value for d₀ ^(i), d₀ ^(p), or d ₀^(b), respectively, for the next picture of the same type. In somecircumstances, the number of bits used for encoding can be relativelylow for a sustained period of time so that bit or byte stuffing is usedto increase the number of bits used in encoding. This prevents a bufferoverrun condition in the decoder buffer. However, the use of bitstuffing can undesirably distort the occupancy value in thecorresponding virtual buffer, which can then result in instability inthe encoder. In one embodiment, the rate control and quantizationcontrol process includes one or more techniques that advantageouslyameliorate against the effects of bit stuffing. Examples of suchtechniques will be described in greater detail later in connection withFIGS. 9A and 9B. The process advances from the state 627 to a decisionblock 630.

In the decision block 630, the illustrated process has completed theprocessing for the picture and determines whether the picture that wasprocessed corresponds to the last picture in the group of pictures(GOP). This can be accomplished by monitoring the values remaining inthe number of P-pictures N_(p) and the number of B-pictures N_(b)described earlier in connection with the state 606. The process proceedsfrom the decision block 630 to a state 632 when there are pictures thatremain to be processed in the group of pictures. Otherwise, i.e., whenthe process has completed processing of the group of pictures, theprocess proceeds from the decision block 630 to a decision block 634.

In the state 632, the process updates the appropriate value in thenumber of P-pictures N_(p) or the number of B-pictures N_(b) andadvances to a state 636 to initiate the processing of the next picturein the group of pictures. It will be understood that the next picture tobe processed may not be the next picture to be displayed because ofpossible reordering of pictures during encoding.

In the state 636, the process updates the corresponding complexityestimators X_(i), X_(p), and X_(b) based on the picture that just beenencoded. For example, if an I-picture had just been encoded, the processupdates the complexity estimator X_(i) for the I-pictures as expressedin Equation 24. If the picture that had just been encoded was aP-picture or was a B-picture, the process updates the correspondingcomplexity estimator X_(p) or X_(b), respectively, as expressed inEquation 25 and in Equation 26.

X_(i)=S_(i)Q_(i)  (Eq. 24)

X_(p)=S_(p)Q_(p)  (Eq. 25)

X_(b)=S_(b)Q_(b)  (Eq. 26)

In Equations 24, 25, and 26, the value of S_(i), S_(p), or S_(b)corresponds to the number of bits generated or used to encode thepicture for a picture of type I-picture, P-picture, or B-picture,respectively. The value of Q_(i), Q_(p), and Q_(b) corresponds to theaverage of the values for the quantization parameter mquant_(j) thatwere used to quantize the macroblocks in the picture. The processadvances from the state 636 to a state 638.

In the state 638, the process updates the remaining number of bits Rallocated to the group of pictures. The update to the remaining numberof bits R allocated to the group of pictures depends on whether the nextpicture to be encoded is a picture from the existing group of picturesor whether the next picture to be encoded is the first picture in a newgroup of pictures. Both Equations 27 and 28 are used when the nextpicture to be processed is the first picture in a new group of pictures.When the next picture to be processed is another picture in the samegroup of pictures as the previously processed picture, then onlyEquation 27 is used. It will be understood that Equations 27 and 28represent assignment statements for the value of R, such that a newvalue for R is represented to the left of the “=” sign and a previousvalue for R is represented to the right of the “=” sign.

R=R−S _((i,p,b))  (Eq. 27)

R=G+R  (Eq. 28)

In Equation 27, the process computes the new value for the remainingnumber of bits R allocated to the group of pictures by taking theprevious value for R and subtracting the number of bits S_((i,p,b)) thathad been used to encode the picture that had just been encoded. Thenumber of bits S_((i,p,b)) that had been used to encode the picture isalso used to calculate the VBV buffer model occupancy as will bedescribed in greater detail later in connection with FIG. 7. Thecomputation expressed in Equation 27 is performed for each picture afterit has been encoded. When the picture that has just been encoded is thelast picture in a group of pictures such that the next picture to beencoded is the first picture in a new group of pictures, the computationexpressed in Equation 27 is further nested with the computationexpressed in Equation 28. In Equation 28, the process adds to aremaining amount in R, which can be positive or negative, a value of G.The variable G was described earlier in connection with Equation 5. Thevalue of G is based on the new group of pictures to be encoded andcorresponds to the number of bits that can be transferred by the datachannel in the amount of time corresponding to the length of thepresentation time for the new group of pictures. The process returnsfrom the state 638 to the state 610 to continue to the video encodingprocess as described earlier.

Returning now to the decision block 634, at this point in the process,the process has completed the encoding of a picture that was the lastpicture in a group of pictures. In the decision block 634, the processdetermines whether it has completed with the encoding of the videosequence. It will be understood that the process can be used to encodevideo of practically indefinite duration, such as broadcast video, andcan continue to encode video endlessly. The process proceeds from thedecision block 634 to a state 640 when there is another group ofpictures to be processed. Otherwise, the process ends.

In the state 640, the process receives the next group of pictures. Itwill be understood that in another embodiment, the process may retrieveonly a portion of the next group of pictures in the state 640 andretrieve remaining portions later. In one embodiment, the state 640 isrelatively similar to the state 602. The process advances from the state640 to a state 642.

In the state 642, the process receives the mode or type of encoding thatis to be applied to the pictures in the group of pictures. In theillustrated rate control and quantization control process, the decisionas to which mode or type of encoding is to be used for each picture inthe group of pictures is made before the pictures are processed by therate control and quantization control process. In one embodiment, thestate 642 is relatively similar to the state 604. The process advancesfrom the state 642 to a state 644.

In the state 644, the process determines the number of P-pictures N_(p)and the number of B-pictures N_(b) in the next group of pictures to beencoded. In one embodiment, the state 644 is relatively similar to thestate 606. The process advances from the state 644 to the state 636,which was described in greater detail earlier, to continue with theencoding process.

Control with VBV Buffer Model Occupancy Levels

FIG. 7 is a flowchart that generally illustrates a process for adjustinga targeted bit allocation based on an occupancy level of a virtualbuffer. To illustrate the operation of the process, the process will bedescribed in connection with MPEG-1 and MPEG-2 video encoding so thatthe virtual buffer corresponds to the video buffer verifier (VBV) buffermodel. The VBV buffer model is a conceptual model that is used by theencoder to model the buffer occupancy levels in a decoder. It will beapparent to one of ordinary skill in the art that other buffer modelscan be used with other video encoding standards. Monitoring of VBVbuffer model levels will be described now in greater detail beforefurther discussion of FIG. 7.

As described earlier in connection with FIG. 4, the VBV buffer modelanticipates or predicts buffer levels in the decoder buffer. Theoccupancy level of the decoder buffer is approximately inverse to theoccupancy level of the encoder buffer, such that a relatively highoccupancy level in the VBV buffer model indicates that relatively fewbits are being used to encode the video sequence, and a relatively lowoccupancy level in the VBV buffer model indicates that relatively manybits are being used to encode the video sequence.

The occupancy level V_(status) of the VBV buffer model is computed andmonitored. In one embodiment, the occupancy level V_(status) of the VBVbuffer model is compared to a predetermined threshold, and the encodingcan be adapted in response to the comparison as will be described ingreater detail later in connection with FIG. 11. In another embodiment,the occupancy level V_(status) of the VBV buffer model is used toadaptively adjust a target number of bits T_(i), T_(p), or T_(b) for apicture to be encoded. A computation for the occupancy level V_(status)is expressed in Equation 29.

$\begin{matrix}{V_{status} = {V_{status} - S_{({i,p,b})} + \frac{bit\_ rate}{picture\_ rate}}} & ( {{Eq}.\mspace{14mu} 29} )\end{matrix}$

Equation 29 represents an assignment statement for the value of theoccupancy level V_(status). A new value for the occupancy levelV_(status) is represented at the left of the “=” sign, and a previousvalue for the occupancy level V_(status) is represented to the right ofthe “=” sign. In one embodiment, the value of the occupancy levelV_(status) is initialized to a target value for the VBV buffer model. Anexample of a target value is 7/8's of the full capacity of the VBVbuffer model. In another embodiment, the value of V_(status) isinitialized to a buffer occupancy that corresponds to a specifiedVBV-delay value. Other initialization values can be readily determinedby one of ordinary skill in the art.

In Equation 29, the occupancy of the VBV buffer model is computed asfollows. The number of bits S_((i,p,b)) that had been used to encode thepicture just encoded is subtracted from the previous value for theoccupancy level V_(status), and the number of bits that would betransmitted in the time period corresponding to a “frame” or picture isadded to the value for the occupancy level V_(status). As illustrated inEquation 29, the number of bits that would be transmitted in the frameis equal to bit rate times the inverse of the frame rate. Thecomputation expressed in Equation 29 is adapted to update the occupancylevel V_(status) for each picture processed. In another embodiment, theexpression is modified to update the occupancy level V_(status) for lessthan each picture, such as every other picture.

As will be described later in connection with FIG. 7, one embodiment ofthe process compares the target number of bits for a picture T_(i),T_(p), or T_(b) to a threshold T_(mid), and adjusts the target number ofbits T_(i), T_(p), or T_(b) in response to the comparison. Thisadvantageously assists the video encoder to produce a data stream thatis compliant with VBV to protect against buffer underrun or bufferoverrun in the decoder.

One embodiment uses five parameters related to VBV buffer modeloccupancy levels for control. It will be understood that in otherembodiments, fewer than five parameters or more than five parameters canalso be used. The parameters can vary in a very broad range and caninclude fixed parameters, variable parameters, adaptable parameters,user-customizable parameters, and the like. In one embodiment, thefollowing parameters are used (in decreasing order of occupancy):V_(high), V_(target), V_(mid), V_(low), and V_(critical).

V_(high) corresponds to a relatively high value for the occupancy of theVBV buffer model. In one embodiment, the process strives to controlencoding such that the occupancy of the VBV buffer model is maintainedbelow V_(high).

V_(target) corresponds to an occupancy level for the VBV buffer modelthat is desired. In one embodiment, the desired buffer occupancy levelV_(target) can be configured by a user.

V_(mid) corresponds to an occupancy level that is about half of thecapacity of the VBV buffer model.

V_(low) corresponds to a relatively low value for the occupancy of theVBV buffer model. In one embodiment, the process strives to controlencoding such that the occupancy of the VBV buffer model is maintainedabove V_(low).

V_(critical) corresponds to an even lower occupancy level than V_(low).In one embodiment, when the occupancy of the VBV buffer model fallsbelow V_(critical), the process proceeds to skip macroblocks inB-pictures as will be described in greater detail later in connectionwith FIG. 11.

Table II illustrates sample values for threshold levels. Other suitablevalues will be readily determined by one of ordinary skill in the art.

TABLE II Threshold Sample Value V_(high) about 63/64 of VBV buffer modelsize V_(target) about ⅞ of VBV buffer model size V_(mid) about ½ of VBVbuffer model size V_(low) about ⅜ of VBV buffer model size V_(critical)about ¼ of VBV buffer model size

The sample values listed in Table II are advantageously scaled to theVBV buffer model size. As described in greater detail earlier inconnection with FIG. 4, the VBV buffer model size is approximately 224kB for MPEG-2 and is approximately 40 kB for MPEG-1. It will beunderstood by one of ordinary skill in the art that the size of avirtual buffer model, such as the VBV buffer model for MPEG-1 andMPEG-2, can vary according with the video encoding standard used and theapplication scenario.

Returning now to FIG. 7, the process illustrated in FIG. 7 adjusts atargeted bit allocation T_(i), T_(p), or T_(b) for a picture based atleast in part on the occupancy level V_(status) of the VBV buffer model.In one embodiment, the process illustrated in FIG. 7 is incorporated inthe state 610 of the process illustrated in FIG. 6. The process canstart at an optional decision block 710, where the process compares thevalue of the targeted bit allocation T_(i), T_(p), or T_(b) (genericallywritten as T_((i,p,b)) in FIG. 7) to one or more target thresholds, suchas to T_(mid) or to T_(high). For example, the target threshold T_(mid)can be selected such that the adjustment process is invoked when the VBVbuffer model occupancy level is relatively low. In another example, thetarget threshold T_(high) can be selected such that the adjustmentprocess is invoked when the VBV buffer model occupancy is relativelyhigh. In one embodiment, only one of the target thresholds T_(mid) orT_(high) is used, in another embodiment, both target thresholds areused, and in yet another embodiment, the optional decision block 710 isnot present and neither target threshold is used. In the illustratedembodiment, the adjustment process is invoked in response to the VBVbuffer model occupancy level and to the number of bits allocated to thepicture to be encoded. The computation of the targeted bit allocationT_(i), T_(p), or T_(b) can be performed as described earlier inconnection with the state 610 and Equations 6, 7, and 8 of FIG. 6.Equation 30a expresses a sample computation for the target thresholdT_(mid). Equation 30b expresses a sample computation for the targetthreshold T_(high).

$\begin{matrix}{T_{mid} = {V_{status} - V_{mid}}} & ( {{{Eq}.\mspace{14mu} 30}a} ) \\{T_{high} = {V_{status} - V_{high} + \frac{bit\_ rate}{picture\_ rate}}} & ( {{{Eq}.\mspace{14mu} 30}b} )\end{matrix}$

The illustrated embodiment of the process proceeds from the optionaldecision block 710 to a state 720 when the targeted bit allocationT_(i), T_(p), or T_(b) exceeds the target threshold T_(mid) or when thetargeted bit allocation T_(i), T_(p), or T_(b) is less than the targetthreshold T_(high). It will be understood that in another embodiment orconfiguration, where the optional decision block 710 is not present, theprocess can start at the state 720. When the targeted bit allocationT_(i), T_(p), or T_(b) exceeds the target threshold T_(mid), the VBVbuffer model occupancy is relatively low. In the illustrated embodiment,the target threshold T_(mid), is selected such that the adjustment tothe targeted bit allocation occurs when a picture is allocated enoughbits such that, without adjustment, the VBV buffer model occupancy wouldfall or would stay below V_(mid). Other thresholds will be readilydetermined by one of ordinary skill in the art.

When the targeted bit allocation T_(i), T_(p), or T_(b) does not exceedthe target threshold T_(mid) and the targeted bit allocation T_(i),T_(p), or T_(b) is not less than the target threshold T_(high), theillustrated process proceeds from the optional decision block 710 to adecision block 730. It will be understood that where the optionaldecision block 710 is not present or is not used, the process can beginat the state 720, which then proceeds to the decision block 730. Inanother embodiment, when the targeted bit allocation T_(i), T_(p), orT_(b) does not exceed the target threshold T_(mid) and the targeted bitallocation T_(i), T_(p), or T_(b) is not less than the target thresholdT_(high), the process proceeds to end from the optional decision block710, such as, for example, by proceeding to the state 612 of the processdescribed in connection with FIG. 6. In the illustrated optionaldecision block 710, the comparison uses the same target thresholdsT_(mid) and/or T_(high) for I-pictures, for P-pictures, and forB-pictures. In another embodiment, the target thresholds T_(mid) and/orT_(high) varies depending on the picture type.

In the state 720, which is entered when the targeted bit allocationT_(i), T_(p), or T_(b) exceeds the target threshold T_(mid), or when thetargeted bit allocation T_(i), T_(p), or T_(b) is less than the targetthreshold T_(high), the process adjusts the value of the targeted bitallocation T_(i), T_(p), or T_(b) ti reduce the number of bits allocatedto the picture. In another embodiment, the process starts at the state720. For example, one embodiment of the process is configurable by auser such that the process does not have the optional decision block 710and instead, starts at the state 720. For example, the adjustment to theT_(i), T_(p), or T_(b) can be configured to decrease the number of bits.Advantageously, when fewer bits are used to encode a picture, the VBVbuffer model occupancy level, and correspondingly, a decoder's bufferoccupancy level, can increase. Equation 31 illustrates a general formulafor the adjustment.

T _((i,p,b)) =α·T _((i,p,b))  (Eq. 31)

In Equation 31, the adjustment factor α can be less than unity such thatthe targeted but allocation T_(i), T_(p), or T_(b) after adjustment issmaller than originally calculated. In one embodiment, the adjustmentfactor α can also correspond to values greater than unity such that thetargeted bit allocation T_(i), T_(p), or T_(b) after adjustment islarger than originally calculated. For clarity, the adjustment ofEquation 31 illustrates an adjustment to a separately calculatedtargeted bit allocation T_(i), T_(p), or T_(b). However, it will beunderstood that the adjustment can also be incorporated in the initialcalculation of the targeted bit allocation T_(i), T_(p), or T_(b). Itwill be understood that Equation 31 corresponds to an assignmentstatement such that the value to the right of the “=” corresponds to thetargeted bit allocation T_(i), T_(p), or T_(b) before adjustment, andthe value to the left of the “=” corresponds to the targeted bitallocation T_(i), T_(p), or T_(b) after adjustment. Equation 32expresses a sample computation for the adjustment factor α.

$\begin{matrix}{\alpha = {1 + \frac{V_{status} - V_{target}}{V_{high} - V_{low}}}} & ( {{Eq}.\mspace{14mu} 32} )\end{matrix}$

As illustrated in Equation 32, the adjustment factor α is less thanunity when V_(status) is less than V_(target), and the adjustment factorα is greater than unity when V_(status) is greater than V_(target). Anet effect of the adjustment expressed in Equation 31 is to trend theoccupancy level of the VBV buffer model to the desired occupancy levelV_(target).

It should be noted that when the targeted bit allocation T_(i), T_(p),or T_(b) exceeds the target threshold T_(mid) in the optional decisionblock 710, the value for the VBV buffer model occupancy V_(status) willtypically be less than the value for the desired VBV occupancy levelV_(target) such that adjustment factor α is less than unity.Advantageously, the targeted bit allocation can be reduced by an amountrelated to how much below the VBV buffer model occupancy V_(status) isfrom the desired VBV occupancy level V_(target). When the targeted bitallocation T_(i), T_(p), or T_(b) is less than the target thresholdT_(high), the value for the VBV buffer model occupancy V_(status) willtypically be higher than the value for the desired VBV occupancy levelV_(target) such that adjustment factor α is greater than unity.Advantageously, the targeted bit allocation can be increased by anamount related to how much above the VBV buffer model occupancyV_(status) is from the desired VBV occupancy level V_(target). Theprocess advances from the state 720 to the decision block 730.

In the decision block 730, the process determines whether the targetedbit allocation T_(i), T_(p), or T_(b), with or without adjustment by thestate 720, falls within specified limits. These limits canadvantageously be used to prevent a value for the targeted bitallocation T_(i), T_(p), or T_(b) from resulting in buffer underrun orbuffer overrun. These limits can be predetermined or can advantageouslybe adapted to the targeted bit allocation T_(i), T_(p), or T_(b) and theVBV buffer model occupancy level V_(status). When the targeted bitallocation T_(i), T_(p), or T_(b) falls outside the limits, the processproceeds from the decision block 730 to a state 740 to bind the targetedbit allocation T_(i), T_(p), or T_(b) to the limits. Otherwise, theprocess ends without further adjustment to the targeted bit allocationT_(i), T_(p), or T_(b).

Equation 33 illustrates a sample computation for an upper limit T_(max)for the targeted bit allocation T_(i), T_(p), or T_(b). Equation 34illustrates a sample computation for a lower limit T_(min) for thetargeted bit allocation T_(i), T_(p), or T_(b).

$\begin{matrix}{T_{\max} = {V_{status} - V_{low}}} & ( {{Eq}.\mspace{14mu} 33} ) \\{T_{\min} = {\max ( {{V_{status} + \frac{bit\_ rate}{picture\_ rate} - V_{high}},0} )}} & ( {{Eq}.\mspace{14mu} 34} )\end{matrix}$

It will be understood that when the targeted bit allocation T_(i),T_(p), or T_(b) exceeds the upper limit T_(max), the targeted bitallocation T_(i), T_(p), or T_(b) is reassigned the value of the upperlimit T_(max), and when the targeted bit allocation T_(i), T_(p), orT_(b) is below the lower limit T_(min), the targeted bit allocationT_(i), T_(p), or T_(b) is reassigned the value of the lower limitT_(min).

The application of the upper limit T_(max) expressed in Equation 33advantageously limits a relatively high value for the targeted bitallocation T_(i), T_(p), or T_(b) such that the VBV buffer modeloccupancy level stays above the lower desired occupancy limit levelV_(low) for the VBV buffer model. The application of the lower limitT_(min) expressed in Equation 34 advantageously limits a relatively lowvalue for the targeted bit allocation T_(i), T_(p), or T_(b) such thatthe buffer-occupancy level stays below the upper desired occupancy limitlevel V_(high), even after the accumulating data over time at theconstant bit rate of the data channel. The lower limit T_(min)corresponds to the higher of the quantities separated by the comma inthe expression. Other values for the upper limit T_(max) and for thelower limit T_(min) will be readily determined by one of ordinary skillin the art. It will be understood that the targeted bit allocationT_(i), T_(p), or T_(b) represents a target for the encoder to achieveand that there may be relatively small variances from the target and thenumber of bits actually used to encode a picture such that the bufferoccupancy level V_(status) may still deviate slightly from the desiredoccupancy limit levels V_(low) and V_(high).

After processing in the state 740, the adjustment process ends. Forexample, where the adjustment process depicted in FIG. 7 is incorporatedin the state 610 of the rate control and quantization control processillustrated in FIG. 6, the process can continue processing from thestate 610.

It will be appreciated by the skilled practitioner that the illustratedprocess can be modified in a variety of ways without departing from thespirit and scope of the invention. For example, in another embodiment,various portions of the illustrated process can be combined, can berearranged in an alternate sequence, can be removed, and the like. Forexample, in one embodiment, the optional decision block 710 is notpresent. In another embodiment, the decision block 730 and the state 740are optional and need not be present.

Macroblock Processing Sequence

FIG. 8A is a flowchart that generally illustrates a sequence ofprocessing macroblocks according to the prior art. FIG. 8B is aflowchart that generally illustrates a sequence of processingmacroblocks according to one embodiment. The processing sequenceillustrated in FIG. 8B advantageously permits the spatial activityand/or motion activity for the macroblocks of a picture to be calculatedsuch that actual values can be used in computations of sums and averagesas opposed to estimates of sums and averages from computations of aprior picture.

The conventional sequence depicted in FIG. 8A starts at a state 802. Inthe state 802, the process performs a computation for spatial activity(texture) and/or for motion estimation for a single macroblock. Theprocess advances from the state 802 to a state 804.

In the state 804, the process uses the computation of spatial activityand/or motion estimation to perform a discrete cosine transformation(DCT) of the macroblock. The computation of spatial activity istypically normalized with a total value of spatial activity. However, atthis point in the process, the computations for spatial activity havenot been completed for the picture that is being encoded. As a result,an estimate from a previous picture is used. For example, the totalspatial activity from the prior picture is borrowed to compute anaverage. In another example, motion estimation from a previous picturecan also be borrowed. Whether or not these estimates are close to theactual values is a matter of chance. When there is a scene changebetween the prior picture and the picture that is being encoded, theestimates can be quite inaccurate. These inaccuracies can impair picturequality and lead to mismatches between the number of bits targeted forencoding of the picture and the number of bits actually used to encodethe picture. These variances in the number of bits consumed to encode apicture can disadvantageously lead to buffer underrun or to bufferoverrun. The process advances from the state 804 to a state 806.

In the state 806, the process performs variable length coding (VLC) forthe DCT coefficients of the macroblock. The VLC compresses the DCTcoefficients. The process advances from the state 806 to a decisionblock 808.

In the decision block 808, the process determines whether it hascompleted encoding all the macroblocks in the picture. The processreturns from the decision block 808 to the state 802 when there aremacroblocks remaining to be encoded. Otherwise, the process proceeds toend until restarted.

A rearranged sequence according to one embodiment is depicted in FIG. 8b and starts at a state 852. In the state 852, the process performscomputations for spatial activity and/or motion estimation for all themacroblocks in the picture that is being encoded. This advantageouslypermits sums and averages of the spatial activities and/or motionestimates to be advantageously computed with actual numbers and not withestimates, and is further advantageously accurate even with a scenechange before the picture that is presently encoded. In another exampleof advantages, in TM5, an average of the spatial activity measuresSavg_act_(j) of 400 is used for the first picture as a “guess” of themeasure. By processing the spatial activity of all the macroblocksbefore the spatial activities are used, the average of the spatialactivity measures Savg_act_(j) can be directly computed and aspeculative “guess” can advantageously be avoided.

Further advantageously, the use of actual sums and averages permits theactual number of bits used to encode a picture to match with thetargeted bit allocation with relatively higher accuracy. Thisadvantageously decreases the chances of undesirable buffer underrun orbuffer overrun and can increase picture quality. In one embodiment, theactual motion estimation for a macroblock is used to allocate bits amongthe macroblocks such that macroblocks with relatively high motion areallocated a relatively high number of bits. By contrast, in aconventional system with macroblock by macroblock processing, the bitsfor macroblocks are typically allocated among macroblocks by therelative motion of the macroblock in a prior picture, which may or maynot be accurate. The process advances from the state 852 to a state 854.

In the state 854, the process performs the DCT computations for all ofthe macroblocks in the picture. The process advances from the state 854to a state 856.

In the state 856, the process performs VLC for the DCT coefficients ofall of the macroblocks in the picture. The process then ends untilrestarted.

In another embodiment, the process performs the computation of spatialactivity and/or motion estimation for all the macroblocks as describedin connection with the state 852, but then loops repetitively around astate to perform DCT computations and another state to perform VLC formacroblocks until processing of the macroblocks of the picture iscomplete.

Bit Stuffing

Bit stuffing or byte stuffing is a technique that is commonly used by anencoder to protect against generating a data stream that would otherwiselead to a decoder buffer overrun. When the number of bits that is usedto encode a picture is relatively low for a sustained period of time,the decoder retrieves data from the decoder buffer at a slower rate thanthe rate at which the data channel adds data to the decoder buffer. Whenthis accumulation of data continues for a sustained period of time suchthat the decoder buffer fills to capacity, data carried by the datachannel can be lost. An example of a sequence of pictures that can berelatively highly compressed such that bit stuffing may be invoked is asequence of pictures, where each picture is virtually completely black.To address this disparity in data rates such that buffer overrun doesnot occur, the encoder embeds data in the data stream that is not used,but consumes space. This process is known as bit stuffing.

Bit stuffing can be implemented in a variety of places in an encodingprocess. In one embodiment, bit stuffing is implemented when appropriateafter the state 632 and before the state 636 in the encoding processdescribed in connection with FIG. 6. In one embodiment, the encodingprocess invokes bit stuffing when the occupancy of the VBV buffer modelattains a predetermined level, such as the V_(high) level describedearlier in connection with FIG. 7. In one embodiment, bit stuffing isinvoked when the VBV buffer model occupancy is about 63/64 of thecapacity of the VBV buffer model.

Though beneficial to resolving decoder buffer overrun problems, bitstuffing can introduce other problems to the encoding process. Theinclusion of bits used in bit stuffing can also be an undesirablesolution. The addition of bits used in bit stuffing in a computation forthe number of bits used to encode a picture S(i,p,b) can indicate to theencoder that more bits are being used to encode the pictures than wereinitially targeted. This can further be interpreted as an indication toencode pictures with reduced quality to decrease the number of bits usedto encode pictures. Over a period of time, this can lead to an evenfurther decrease in the number of bits used to encode the pictures, withproportionally even more bits used in bit stuffing. With relatively manybits used in bit stuffing, relatively few bits remain to actually encodethe pictures, which then reduces the quality of the encoded picturesover time.

FIG. 9A illustrates a process that advantageously stabilizes theencoding process, thereby reducing or eliminating the tendency for bitstuffing to destabilize an encoding process and the tendency for thepicture quality to degrade over time. As will be described later, theprocess depicted in FIG. 9A can be implemented in a variety of locationswithin an encoding process.

It will be appreciated by the skilled practitioner that the illustratedprocess can be modified in a variety of ways without departing from thespirit and scope of the invention. For example, in another embodiment,various portions of the illustrated process can be combined, can berearranged in an alternate sequence, can be removed, and the like. Theprocess can begin at a decision block 902 or at a decision block 904. Inone embodiment, only one of the decision block 902 or the decision block904 is present in the process. In the illustrated embodiment, both thedecision block 902 and the decision block 904 are present in theprocess. For example, the process can start at the decision block 902prior to the encoding of a picture, and the process can start at thedecision block 904 after the encoding of a picture. For example, thestart of process of FIG. 9A at the decision block 902 can beincorporated after the state 612 and before the state 614 of the ratecontrol and quantization control process described in connection withFIG. 6. In another example, the start of the process of FIG. 9A at thedecision block 904 can be incorporated at the state 627 of the processof FIG. 6.

In the decision block 902, the process determines whether there has beena scene change between the picture that is being encoded and theprevious picture encoded. The determination of a scene change can beperformed prior to the encoding of a picture. In one embodiment, thedecision block 902 is optional. A variety of methods can be used todetermine whether there has been a scene change. In one embodiment, theprocess reuses the results of a computation that is used to encode thepicture, such as the results of a sum of absolute differences (SAD)measurement. In one embodiment, scene change detection varies accordingto the picture type. In one embodiment, for I-pictures, the averagespatial activity Sact_avg for the current picture is compared to thecorresponding previous average spatial activity. For example, when thecurrent activity is at least 2 times or less than half that of theprevious I-picture, a scene change is detected. Other factors that canbe used, such as 3 times and 1/3, 4 times and 1/4 or a combination ofthese will be readily determined by one of ordinary skill in the art. Inaddition, one embodiment imposes an additional criterion for a minimumnumber of pictures to pass since the previous scene change has beendeclared in order to declare a new scene change. For P-pictures, theaverage of motion activity can be used instead of the average spatialactivity to detect a scene change, together with a relative comparisonfactor such as (2, 1/2), (3, 1/3), (4, 1/4) and the like. To increasethe robustness of the decision, one embodiment further uses a minimumaverage motion activity measure for the current P picture, since averagemotion activity by itself can indicate relatively high motion, which canbe attributed to a scene change. For example, values of minimum averagemotion activity measure in the range of about 1000 to about 4000 can beused to indicate relatively high motion

The process proceeds from the decision block 902 to end such as, forexample, by entering the state 614 when the process determines thatthere has been no scene change. In addition, it will be understood thatthere may be other portions of the encoding process which determinewhether there has been a scene change, and where applicable, a previousdetermination can be reused in the decision block 902 by inspection ofthe state of a flag or semaphore indicating whether there has been ascene change. When the process determines that there has been a scenechange, the process proceeds from the decision block to a sub-process906.

In the decision block 904, the process determines whether the encodingprocess is in a critical state. In an alternate embodiment of theprocess, only one of the decision block 902 or the decision block 904 ispresent, and the other is optional. Where the decision block 904 ispresent in the process, the monitoring of the occupancy of the VBVbuffer model can be invoked after the encoding of a picture. Thecriteria for determining that the encoding process is in a criticalstate can vary in a very broad range. In one embodiment, the criticalstate corresponds to when bit stuffing is performed by the encodingprocess when a value for the quantization parameter mquant_(j) is notrelatively low, such as not at its lowest possible value. The value forthe quantization parameter mquant_(j) that will correspond to relativelylow values, such as the lowest possible value, will vary according tothe syntax of the encoding standard. The process proceeds from thedecision block 904 to the sub-process 906 when the occupancy of the VBVbuffer model is determined to be in the critical state. Otherwise, theprocess proceeds to end such as, for example, by entering the state 627of the process described in connection with FIG. 6.

In the sub-process 906, the process normalizes the virtual bufferoccupancy values for the initial conditions as represented by thevariables d₀ ^(i), d₀ ^(p), and d ₀ ^(b) described earlier in connectionwith the state 612. The normalized values can be computed by a varietyof techniques. In the illustrated sub-process 906, the normalized valuesdepend on the occupancy level of the VBV buffer model. The illustratedsub-process 906 includes a state 908, a decision block 910, a state 912,and a state 914.

In the state 908, one embodiment of the process calculates values for asum and a delta as set forth in Equations 35 and 36a or 36b.

sum=d ₀ ^(i) +d ₀ ^(p) +d ₀ ^(b)  (Eq. 35)

delta=vbv_buffer_size−V _(status)  (Eq. 36a)

delta=V _(initial) −V _(status)  (Eq. 36b)

For Equation 35, the values for the virtual buffer occupancy levels forthe initial conditions can be obtained by application of Equations 9,10, and 11 as described in greater detail earlier in connection with thestate 612 of FIG. 6. As illustrated in Equations 36a and 36b, deltaincreases with a decreasing occupancy level in a buffer model. InEquation 36a, the variable vbv_buffer size relates to the capacity ofthe VBV buffer model that is used for encoding. In Equation 36b, thevariable V_(initial) relates to an initialization value for theoccupancy level of the VBV buffer model. In one embodiment, the value ofV_(initial) is about 7/8's of the capacity of the VBV buffer model. Inanother embodiment, instead of V_(initial), the process can use a targetoccupancy level such as V_(target), but it should be noted that theinitialization value and the target occupancy can be the same value. Inanother embodiment, delta can be based on a different quantity relatedto the size of the buffer model subtracted by the occupancy level of thebuffer model. The size or capacity of the VBV buffer model can varyaccording to the standard that is used for encoding. For example, asdescribed earlier in connection with FIG. 4, the MPEG-1 and the MPEG-2encoding standards specify a VBV buffer size or about 40 kB and about224 kB, respectively. Other standards can specify amounts of memorycapacity for the VBV buffer model. The process advances from the state908 to the decision block 910.

In the decision block 910, the process determines whether the value forsum is less than the value for a predetermined threshold T_(norm). Thevalue of the predetermined threshold T_(norm) should correspond to somevalue that indicates a usable range. For example, one such value for thepredetermined threshold T_(norm) is zero. Other values will be readilydetermined by one of ordinary skill in the art. The process proceedsfrom the decision block 910 to the state 912 when the value for sum isless than the value T_(norm). Otherwise, the process proceeds from thedecision block 910 to the state 914.

The value for delta corresponds to the unoccupied space in the VBVbuffer model for Equation 36a or to the discrepancy between the initialVBV buffer model status and the current VBV buffer model status inEquation 36b. It will be understood that other comparisons can be madebetween the sum of the virtual buffer levels and the unoccupied levels.For example, in another embodiment, a less than or equal to comparisoncan be made, an offset can be included, etc.

In the state 912, one embodiment of the process reassigns the virtualbuffer occupancy values for the initial conditions d_(o) ^(i), d_(o)^(p), and d ₀ ^(b) with normalized values according to Equations 37, 38,and 39.

d ₀ ^(i)=delta·fr ^(i)  (Eq. 37)

d ₀ ^(p)=delta·fr ^(p)  (Eq. 38)

d ₀ ^(b)=delta·fr ^(b)  (Eq. 39)

In Equations 37, 38, and 39, the value for delta can be calculated fromEquation 36, and the values for fr^(i), fr^(p), and fr^(b) can vary in avery broad range. The values for fr^(i), fr^(p), and fr^(b) willtypically range between 0 and 1 and can be the same value or differentvalues. Further, in one embodiment, the values for fr^(i), fr^(p), andfr^(b) are selected such that they sum to a value of approximately 1,such as the value of 1. In one embodiment, the values for fr^(i),fr^(p), and fr^(b) correspond to about 5/17, about 5/17, and about 7/17,respectively. Other values for fr^(i), fr^(p), and fr^(b) will bereadily determined by one of ordinary skill in the art. The process canthen end by, for example, entering the state 614 of the processdescribed in connection with FIG. 6.

Returning to the state 914, at this point in the process, the processhas determined that the value for sum is not less than the value forT_(norm). In the state 914, one embodiment of the process reassigns thevalues of the virtual buffer occupancy variables for the initialconditions d₀ ^(i), d₀ ^(p), and d ₀ ^(b) with normalized valuesaccording to Equations 40, 41, and 42,

$\begin{matrix}{d_{0}^{i} = {d_{0}^{i} \cdot \frac{delta}{sum}}} & ( {{Eq}.\mspace{14mu} 40} ) \\{d_{0}^{p} = {d_{0}^{p} \cdot \frac{delta}{sum}}} & ( {{Eq}.\mspace{14mu} 41} ) \\{d_{0}^{b} = {d_{0}^{b} \cdot \frac{delta}{sum}}} & ( {{Eq}.\mspace{14mu} 42} )\end{matrix}$

Equations 40, 41, and 42 correspond to assignment statements for thevalues of the virtual buffer occupancy variables for the initialconditions d₀ ^(i), d₀ ^(p), and d ₀ ^(b). The values to the right ofthe “=” correspond to the values before adjustment, and the values tothe left of the “=” correspond to the values after adjustment. It willbe observed that when the value for delta and the value for sum areapproximately the same, that relatively little adjustment to the valuesoccurs. When the value for sum is relatively high compared to the valuefor delta, the values of the virtual buffer occupancy variables for theinitial conditions d₀ ^(i), d₀ ^(p), and d ₀ ^(b) are reducedproportionally. It should also be noted that relatively small values canalso be added to the value of sum used in Equations 40-42 to preventdivision by zero problems. After adjustment, the process ends by, forexample, proceeding to the state 614 of the process described earlier inconnection with FIG. 6.

FIG. 9B is a flowchart that generally illustrates a process forresetting virtual buffer occupancy levels upon the detection of anirregularity in a final buffer occupancy level. The process forresetting can be incorporated into encoding processes, such as in thestate 627 of the rate control and quantization control process describedearlier in connection with FIG. 6.

The process begins at a decision block 952. As explained earlier inconnection with the state 627 of the rate control and quantizationcontrol process described in connection with FIG. 6, the final occupancy(fullness) of the applicable virtual buffer, i.e., the value of d_(j)^(i), d_(j) ^(p), or d _(j) ^(b), where j=MB_cnt, can be used as theinitial condition for the encoding of the next picture of the same type,i.e., as the value for d₀ ^(i), d₀ ^(p), or d ₀ ^(b) for the picture ofthe same type (I, P, or B). When encoding via the process described inTM5, the final occupancy of the applicable virtual buffer, i.e., thevalue of d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b), is always used as theinitial condition for the encoding of the next picture of the same type.However, the final occupancy of the applicable virtual buffer is notalways an appropriate value to use.

In the decision block 952, the process determines whether the finaloccupancy of the applicable virtual buffer, i.e., the value of d_(j)^(i), d_(j) ^(p), or d _(j) ^(b), is appropriate to use. In oneembodiment, the appropriateness of a value is determined by whether thevalue is physically possible. A virtual buffer models a physical buffer.A physical buffer can be empty, can be partially occupied with data, orcan be fully occupied with data. However, a physical buffer cannot holda negative amount of data. To distinguish between physically attainablevalues and non-physically attainable values, one embodiment of theprocess compares the value for the final occupancy of the applicablevirtual buffer to a predetermined threshold tr.

In one embodiment, the value of tr is zero to distinguish between aphysically attainable buffer occupancy and a buffer occupancy that isnot physically attainable. In one embodiment, a value that is relativelyclose to zero is used. Although the value of tr can correspond to arange of values, including values near to zero such as one, two, three,etc., the value of tr should not permit a negative value for the finaloccupancy to be deemed appropriate. It will be understood that when thevalue used for tr is zero, the process can distinguish betweenphysically attainable values and non-physically attainable values byinspecting the sign, i.e., positive or negative, associated with thevalue of the final occupancy of the applicable virtual buffer. It willalso be understood that when integer comparisons are made, a comparisonusing an inequality such as greater than negative one, i.e., >−1, canalso be used, such that a value for tr can correspond to −1. The processproceeds from the decision block 952 to a state 954 when the finaloccupancy value is not appropriate to use as an initial condition forthe next picture of the same type. Otherwise, the process proceeds fromthe decision block 952 to a state 956.

In the state 954, the process resets the final buffer occupancy valuefor the picture type that had just been encoded d_(j) ^(i), d_(j) ^(p),or d _(j) ^(b), where j=MB_cnt, to an appropriate value, such as aphysically attainable value. Appropriate values can include any valuefrom zero to the capacity of the applicable virtual buffer. In oneembodiment, the final buffer occupancy value is reset to a relativelylow value that is near zero, such as zero itself. The process canadvance from the state 954 to an optional state 958, or the process canadvance from the state 954 to the state 956.

In the optional state 958, the process normalizes the virtual bufferoccupancy values d_(j) ^(i), d_(j) ^(p), and d _(j) ^(b). In the priorstate 954, the process had corrected for a non-physically attainablevalue in the virtual occupancy value d_(j) ^(i), d_(j) ^(p), or d _(j)^(b), that applies to the type of picture that was encoded. For example,the process can take the prior negative value of the applicable virtualoccupancy value d_(j) ^(i), d_(j) ^(p), or d _(j) ^(b), and allocate thenegative value to the remaining virtual occupancy values such that thesum of the virtual occupancy values d_(j) ^(i), d_(j) ^(p), and d _(j)^(b), sums to zero. For example, in one embodiment, the process addshalf of the negative value to each of the two other virtual occupancyvalues. The process advances from the optional state 958 to the state956.

In the state 956, the process stores the final virtual buffer occupancyvalue as reset by the state 954 or unmodified via the decision block 952and ends. The process can end by, for example, proceeding to the state619 of the rate control and quantization control process describedearlier in connection with FIG. 6.

Scene Change within a Group of Pictures

FIG. 10A illustrates examples of groups of pictures. Scene changesbetween pictures of a sequence can exist within a group of pictures.Scene changes are relatively commonly encountered in a sequence ofpictures. The scene changes can result from a change in camera shots, aswitching between programs, a switch to a commercial, an edit, and thelike. With a scene change, the macroblocks of a present picture bearlittle or no relation to the macroblocks of a previous picture, so thatthe macroblocks of the present picture will typically be intra coded,rather than predictively coded. Since an I-picture includes onlyintra-coded macroblocks, scene changes are readily accommodated withI-pictures.

Although pictures corresponding to scene changes are preferably codedwith I-pictures, the structure of a group of pictures, i.e., thesequence of picture types, can be predetermined in some systems oroutside of the control of the encoder. For example, one direct broadcastsatellite (DBS) system has a predetermined pattern of I-pictures,P-pictures, and B-pictures that is followed by the encoder. As a result,scene changes can occur in B-pictures or in P-pictures. A conventionalencoder can accommodate scene changes in B-pictures by referencing thepredictive macroblocks of the B-picture to an I-picture or to aP-picture that is later in time.

A scene change in a P-picture can be problematic. A P-picture caninclude intra-coded macroblocks and can include predictively-codedmacroblocks. However, a P-picture cannot reference a picture that islater in time, so that the scene change will typically be encoded usingonly intra-coded macroblocks. In substance, a scene change P-picture ina conventional encoder is an I-picture, but with the bit allocation andthe header information of a P-picture. In a conventional encoder, aP-picture is allocated fewer bits than an I-picture so that the picturequality of a scene change P-picture is noticeably worse than for anI-picture. Other pictures, such as B-pictures and other P-pictures, canbe predictively coded from the P-picture with the scene change, therebydisadvantageously propagating the relatively low picture quality of thescene change P-picture.

As described earlier in connection with FIGS. 1 and 5, the pictures of asequence are arranged into groups of pictures. A group starts with anI-picture and ends with the picture immediately prior to a subsequentI-picture. The pictures within a group of pictures can be arranged in adifferent order for presentation and for encoding. For example, a firstgroup of pictures 1002 in a presentation order is illustrated in FIG.10A. An I-picture 1004 for a next group of pictures is also shown inFIG. 10A.

The pictures of a sequence can be rearranged from the presentation orderwhen encoding and decoding. For example, the first group of pictures1002 can be rearranged to a second group of pictures 1010, where thegroup is a first group of a sequence, and can be rearranged to a thirdgroup of pictures 1020, where the group is an ongoing part of thesequence. The second group of pictures 1010 and the third group ofpictures 1020 are illustrated in encoding order. The end of the secondgroup of pictures 1010 occurs when an I-picture 1012 from another groupis encountered. Due to the reordering, two B-pictures 1014, 1016 thatwere originally in the first group of pictures 1002 in the presentationorder are now no longer in the group of pictures as rearranged forencoding. With respect to the process described in connection with FIG.10B, a group of pictures relates to a group in an encoding order.

The third group of pictures 1020 will be used to describe the processillustrated in FIG. 10B. The third group of pictures 1020 includes twopictures 1022, 1024 that will be presented before the I-picture 1026 ofthe third group of pictures 1020. In the illustrated example, a scenechange occurs in the third group of pictures 1020 at a P-picture 1030within the third group of pictures 1020. The process described in FIG.10B advantageously recognizes the scene change and reallocates theremaining bits for the remaining pictures 1032 in the third group ofpictures 1020 to improve picture quality.

FIG. 10B is a flowchart that generally illustrates a process forresetting encoding parameters upon the detection of a scene changewithin a group of pictures (GOP). In the illustrated embodiment of theprocess, the encoding order is used to describe the grouping of groupsof pictures.

The process illustrated in FIG. 10B identifies scene-change P-picturesand advantageously reallocates bits within the remaining pictures of thegroup of pictures without changing the underlying structure of the groupof pictures. The process advantageously allocates relatively more bitsto the scene change P-picture, thereby improving picture quality. Theillustrated process can be incorporated into the rate control andquantization control process described earlier in connection with FIG.6. For example, the process of FIG. 10B can be incorporated before thestate 610 of FIG. 6. The skilled practitioner will appreciate that theillustrated process can be modified in a variety of ways withoutdeparting from the spirit and scope of the invention. For example, inanother embodiment, various portions of the illustrated process can becombined, can be rearranged in an alternate sequence, can be removed,and the like.

The process begins at a decision block 1052. In the decision block 1052,the process determines whether there has been a scene change or arelatively sudden increase in an amount of motion in a picture. Thescene change can be determined by a variety of techniques. In oneembodiment, the process makes use of computations of picture comparisonsthat are already available. For example, one embodiment of the processuses a sum of absolute differences (SAD) measurement. The SADmeasurement can be compared to a predetermined value, to a movingaverage, or to both to determine a scene change. For example, a SADmeasurement that exceeds a predetermined level, or a SAD measurementthat exceeds double the moving average of the SAD can be used to detecta scene change. Advantageously, the SAD measurement can detect a scenechange or a sudden increase in an amount of motion in a picture. It willbe understood that there may be another portion of the encoding processthat also monitors for a scene change, and in one embodiment, theresults of another scene change detection is reused in the decisionblock 1052. The process proceeds from the decision block 1052 to adecision block 1054 when a scene change is detected. Otherwise, theprocess proceeds to end, such as, for example, entering the state 610 ofthe rate control and quantization control process described earlier inconnection with FIG. 6.

In the decision block 1054, the process determines whether the type ofthe picture to be encoded corresponds to the P-type. In anotherembodiment, the order of the decision block 1052 and the decision block1054 are interchanged from that shown in FIG. 10B. The process proceedsfrom the decision block 1054 to a state 1056 when the picture is to beencoded as a P-picture. Otherwise, the process proceeds to end by, forexample, entering the state 610 of the rate control and quantizationcontrol process described earlier in connection with FIG. 6.

In the state 1056, the process reallocates bits among the remainingpictures of the group of pictures. Using the third group of pictures1020 of FIG. 10A as an example, when a scene change is detected at theP-picture 1030, the remaining bits R are advantageously reallocatedamong the remaining pictures 1032. In one embodiment, the processencodes the remaining pictures 1032 as though the P-picture 1030 is anI-picture, but without altering the structure of the group of picturesby not changing the type of picture of the P-picture 1030.

The process for encoding the P-picture 1030 as though it is an I-picturecan be performed in a number of ways. For example, one embodiment of theprocess effectively decrements the number of P-pictures N_(p) to beencoded before the P-picture with the scene change is encoded, and usesthe decremented value of N_(p) in Equation 6 to generate a targeted bitallocation. Equation 6, which is used in a conventional system only tocalculate a targeted bit allocation T_(i) for a I-picture, can be usedby the process of FIG. 10B to calculate a targeted bit allocation forthe P-picture with the scene change. Equation 43 illustrates anexpression of such a targeted bit allocation, expresses as T_(p).

$\begin{matrix}{T_{p^{\prime}} = {\max \begin{Bmatrix}{( \frac{R}{( {1 + \frac{( {N_{p} - 1} )X_{p}}{X_{i}K_{p}} + \frac{N_{b}X_{b}}{X_{i}K_{b}}} )} ),} \\( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )\end{Bmatrix}}} & ( {{Eq}.\mspace{14mu} 43} )\end{matrix}$

This advantageously allocates to the P-picture a relatively large numberof bits, such that the P-picture with the scene change can encode thescene change with relatively high quality. Equations 7 and 8 can then beused for the subsequent encoding of P-pictures and B-pictures thatremain to be encoded in the group of pictures. Optionally, the processcan further reset the values for the complexity estimators X_(i), X_(p),and X_(b) in response to the scene change by, for example, applyingEquations 1-3 described earlier in connection with the state 608 of therate control and quantization control process of FIG. 6. The processthen ends by, for example, proceeding to the state 610 of the ratecontrol and quantization control process. It will be understood that theprocess described in connection with FIGS. 10A and 10B can be repeatedwhen there is more than one scene change in a group of pictures.

Selective Skipping of Macroblocks in B-Pictures

FIG. 11 is a flowchart that generally illustrates a process for theselective skipping of data in a video encoder. This selective skippingof data advantageously permits the video encoder to maintain relativelygood bit rate control even in relatively extreme conditions. Theselective skipping of data permits the video encoder to produce encodeddata streams that advantageously reduce or eliminate relatively lowoccupancy levels in a decoder buffer, such as decoder buffer underrun.Decoder buffer underrun can occur when the playback bit rate exceeds therelatively constant bit rate of the data channel for a sustained periodof time such that the decoder buffer runs out of data. Decoder bufferunderrun is quite undesirable and results in a discontinuity such as apause in the presentation.

Even without an occurrence of decoder buffer underrun, data streams thatresult in relatively low decoder buffer occupancy levels can beundesirable. As explained earlier in connection with FIG. 4, a buffermodel, such as the VBV buffer model, is typically used in an encodingprocess to model the occupancy levels of a decoder buffer. When aconventional encoder determines that the occupancy level of the buffermodel is dangerously low, the conventional encoder can severelycompromise picture quality in order to conserve encoding bits andmaintain bit rate control. The effects of relatively low VBV buffermodel occupancy levels is noticeable in the severely degraded quality ofmacroblocks.

The process generally illustrated by the flowchart of FIG. 11advantageously skips the encoding of selected macroblocks whenrelatively low buffer model occupancy levels are detected, therebymaintaining relatively good bit rate control by decreasing the number ofbits used to encode the pictures in a manner that does not impactpicture quality as severely as conventional techniques. In one example,the process illustrated in FIG. 11 can be incorporated in the state 623of the rate control and quantization control process described earlierin connection with FIG. 6. The skilled practitioner will appreciate thatthe illustrated process can be modified in a variety of ways withoutdeparting from the spirit and scope of the invention. For example, inanother embodiment, various portions of the illustrated process can becombined, can be rearranged in an alternate sequence, can be removed,and the like.

The process starts at a decision block 1102, where the processdetermines whether the picture designated to be encoded corresponds to aB-picture. B-pictures can be encoded with macroblocks that arepredictively coded based on macroblocks from other pictures (I-picturesor P-pictures) that are earlier in time or later in time in thepresentation order. However, during the encoding process, the pictures(I-pictures or P-pictures) that are used to encode a B-picture areencoded prior to the encoding of the B-picture. The process proceedsfrom the decision block 1102 to a decision block 1104 when the pictureto be encoded is a B-picture. Otherwise, the process proceeds to end,by, for example, returning to the state 623 of the process describedearlier in connection with FIG. 6.

In the decision block 1104, the process determines whether the VBVbuffer occupancy level is relatively low. During the encoding process, arelatively large number of bits may have already been consumed in theencoding of the pictures from which a B-picture is to be encoded. Insome circumstances, this consumption of data can lead to a low VBVbuffer occupancy level. For example, the process can monitor theoccupancy level V_(status) of the VBV buffer model, which was describedearlier in connection with FIG. 7, and compare the occupancy levelV_(status) to a predetermined threshold, such as to V_(critical). Thecomparison can be made in a variety of points in the encoding process.In one embodiment, the comparison is made after a picture has beenencoded and after the VBV buffer model occupancy level has beendetermined, such as after the state 638 or after the state 610 of therate control and quantization control process described earlier inconnection with FIG. 6. In one embodiment, the comparison isadvantageously made before any of the macroblocks in the picture havebeen encoded, thereby advantageously preserving the ability to skip allof the macroblocks in the picture when desired to conserve a relativelylarge amount of bits.

In one example, V_(critical) is set to about 1/4 of the capacity of theVBV buffer model. It should be noted that the capacity of the VBV buffermodel or similar buffer model can vary with the encoding standard. Itwill be understood that an appropriate value for V_(critical) can beselected from within a broad range. For example, other values such as1/16, 1/8, 1/10, and 3/16 of the capacity of the VBV buffer model canalso be used. Other values will be readily determined by one of ordinaryskill in the art. In one embodiment, the process permits the setting ofV_(critical), to be configured by a user. The process proceeds from thedecision block 1104 to a state 1106 when the occupancy level V_(status)of the VBV buffer model falls below the predetermined threshold.Otherwise, the process proceeds from the decision block 1104 to a state1108.

In the state 1106, the process skips macroblocks in the B-picture. Inone embodiment, all the macroblocks are skipped. In another embodiment,selected macroblocks are skipped. The number of macroblocks skipped canbe based on, for example, the occupancy level V_(status) of the VBVbuffer. Data for an “encoded” B-picture is still formed, but withrelatively little data for the skipped macroblocks. In the encodingprocess, a bit or flag in the data stream indicates a skippedmacroblock. For example, in a technique known as “direct mode,” a flagindicates that the skipped macroblock is to be interpolated duringdecoding between the macroblocks of a prior and a later (in presentationtime) I- or P-picture. Another flag indicates that the skippedmacroblock is to be copied from a macroblock in a prior in presentationtime I- or P-picture. Yet another flag indicates that the skippedmacroblock is to be copied from a macroblock in a later in presentationtime I- or P-picture. The skipping of macroblocks can advantageouslyencode a B-picture in relatively few bits. In one example, a B-picturefor MPEG-2 with all the macroblocks skipped can advantageously beencoded using only about 300 bits. After the skipping of macroblocks forthe B-picture is complete, the process ends by, for example, returningto the state 623 of the process described earlier in connection withFIG. 6.

In the state 1108, the process has determined that the occupancy levelV_(status) of the VBV buffer is not relatively low, and the processencodes the macroblocks in the B-picture. After the encoding of themacroblocks for the B-picture is complete, the process ends by, forexample, returning to the state 623 of FIG. 6. It will be understoodthat the decisions embodied in the decision block 1102 and/or thedecision block 1104 can be performed at a different point in the processof FIG. 6 than the state 1106 or the state 1108.

Various embodiments of the invention have been described above. Althoughthis invention has been described with reference to these specificembodiments, the descriptions are intended to be illustrative of theinvention and are not intended to be limiting. Various modifications andapplications may occur to those skilled in the art without departingfrom the true spirit and scope of the invention as defined in theappended claims.

1. A method implemented in computer executable form in a video encodingprocess for adjusting a targeted bit allocation T for a picture that isto be encoded and transmitted to a decoder, the method comprising:computing at an encoder a targeted bit allocation T for encoding thepicture and storing in an encoder buffer; determining a threshold fromat least one desired occupancy level of a buffer model characterizing adecoder buffer status of the decoder; adaptively adjusting the targetedbit allocation T at the encoder at least partially in response to acomparison of an occupancy status of the buffer model with thethreshold; and providing the adjusted targeted bit allocation T to thevideo encoding process.
 2. The method as defined in claim 1, wherein thetargeted bit allocation T prior to adaptive adjustment is an originaltargeted bit allocation T and the threshold is a first threshold,further comprising: comparing the original targeted bit allocation T tothe first threshold; and adaptively adjusting the original targeted bitallocation T when the original targeted bit allocation T exceeds thefirst threshold but not otherwise.
 3. The method as defined in claim 2,wherein the first threshold corresponds to T_(mid), where T_(mid)comprises:T _(mid) =V _(status) −V _(mid), where V_(status) is the occupancystatus and V_(mid) is the desired occupancy level.
 4. The method asdefined in claim 1, wherein the targeted bit allocation T prior toadaptive adjustment is an original targeted bit allocation T and thethreshold is a second threshold, further comprising: comparing theoriginal targeted bit allocation T to the second threshold; andadaptively adjusting the original targeted bit allocation T when theoriginal targeted bit allocation T is below the second threshold but nototherwise.
 5. The method as defined in claim 1, wherein the targeted bitallocation T prior to adaptive adjustment is an original targeted bitallocation T and the threshold includes a first threshold and a secondthreshold, further comprising: comparing the original targeted bitallocation T to at least one of the first threshold and the secondthreshold; adaptively adjusting the original targeted bit allocation Twhen the original targeted bit allocation T exceeds the first thresholdor when the targeted bit allocation T is below the second threshold, butnot otherwise.
 6. The method as defined in claim 1, wherein computingthe targeted bit allocation T for I-pictures, for P-pictures, and forB-pictures corresponds to computing T_(i), T_(p), and T_(p),respectively, where T_(i), T_(p), and T_(b) further comprise:${T_{i} = {\max \{ {( \frac{R}{( {1 + \frac{N_{p}X_{p}}{X_{t}K_{p}} + \frac{N_{b}X_{b}}{X_{t}K_{b}}} )} ),( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )} \}}};$${T_{p} = {\max \{ {( \frac{R}{( {N_{p} + \frac{N_{b}K_{p}X_{b}}{K_{b}X_{p}}} )} ),( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )} \}}};{and}$${T_{b} = {\max \{ {\frac{R}{( {N_{b} + \frac{N_{p}K_{b}X_{p}}{K_{p}X_{b}}} )},( \frac{bit\_ rate}{8 \cdot {picture\_ rate}} )} \}}},$where R is a remaining number of bits allocated to a group of picturesto which the picture belongs, N_(p) and N_(b) are a number of P-picturesand B-pictures in the group of pictures, respectively, K_(b) and K_(p)are constants associated with complexity matrices for P-pictures andB-pictures, respectively, X_(i), X_(p) and X_(b) are complexityestimators for I-pictures, P-pictures and B-pictures, respectively,bit_rate is a bit rate of a transmission channel from the encoder to thedecoder and picture_rate is a display rate of pictures.
 7. The method asdefined in claim 1, wherein the adaptively adjusting further comprisesmultiplying the targeted bit allocation T by a factor α, where αcomprises:${\alpha = {1 + \frac{V_{status} - V_{target}}{V_{high} - V_{low}}}},$where V_(status) is the occupancy status, V_(target) is a desiredoccupancy level of the buffer model, V_(high) and V_(low) are high andlow occupancy levels of the buffer model, respectively.
 8. The method asdefined in claim 7, where V_(target) is about 7/8 of a capacity of thebuffer model, where V_(high) is about 63/64 of the capacity of thebuffer model, and where V_(low) is about 3/8 of the capacity of thebuffer model.
 9. The method as defined in claim 7, wherein the targetedbit allocation T adaptively varies such that over time, the bufferoccupancy level V_(status) of the buffer model trends to the desiredbuffer occupancy level V_(target).
 10. The method as defined in claim 7,where the desired buffer occupancy level V_(target) is configured by auser.
 11. The method as defined in claim 1, wherein the buffer model isa virtual buffer verifier (VBV) buffer model.
 12. The method as definedin claim 1, wherein the video encoding process is performed in realtime, and where a constant bit rate is used to update a calculation ofthe buffer occupancy level.
 13. The method as defined in claim 1,further comprising bounding the targeted bit allocation T to a maximumvalue of T_(max), where T_(max) comprises:T _(max) =V _(status) −V _(low), where V_(status) is the occupancystatus and V_(low) is a low occupancy level of the buffer model.
 14. Themethod as defined in claim 13, wherein V_(low) is about 3/8 of acapacity of the buffer model.
 15. The method as defined in claim 1,wherein the video encoding process is performed in real time.
 16. Amethod implemented in computer executable form in a video encodingprocess for calculating a targeted bit allocation T for a picture thatis to be encoded and transmitted to a decoder comprising: computing atan encoder a targeted bit allocation T for encoding the picture andstoring in an encoder buffer; determining a threshold from at least onedesired occupancy level of a buffer model characterizing a decoderbuffer status of the decoder; scaling the targeted bit allocation T atthe encoder by a factor at least partially in response to a comparisonof an occupancy status of the buffer model with the threshold; andproviding the adjusted targeted bit allocation T to the video encodingprocess.
 17. The method as defined in claim 16, wherein the factor is afactor α, where α comprises:${\alpha = {1 + \frac{V_{status} - V_{target}}{V_{high} - V_{low}}}},$where V_(status) the occupancy status, V_(target) is a desired occupancylevel of the buffer model, V_(high) and V_(low) are high and lowoccupancy levels of the buffer model, respectively.
 18. The method asdefined in claim 17, where V_(target) is about 7/8 of a capacity of thebuffer model, where V_(high) is about 63/64 of the capacity of thebuffer model, and where V_(low) is about 3/8 of the capacity of thebuffer model.
 19. The method as defined in claim 16, wherein thetargeted bit allocation T adaptively varies such that over time, thebuffer occupancy level of the buffer model trends to the desired bufferoccupancy level.
 20. The method as defined in claim 16, where thedesired buffer occupancy level is configured by a user.
 21. The methodas defined in claim 16, wherein the buffer model is a virtual bufferverifier (VBV) buffer model.
 22. The method as defined in claim 16,wherein the video encoding process is performed in real time, and wherea constant bit rate is used to update a calculation of the bufferoccupancy level.
 23. The method as defined in claim 16, furthercomprising bounding the targeted bit allocation T to a maximum value ofT_(max), where T_(max) comprises:T _(max) =V _(status) −V _(low), where V_(status) is the occupancystatus and V_(low) is a low occupancy level of the buffer model.
 24. Themethod as defined in claim 23, wherein V_(low) is about 3/8 of acapacity of the buffer model.
 25. A computer readable medium withcomputer executable instructions for adjusting a targeted bit allocationT for a picture that is to be encoded and transmitted to a decoder,comprising: instructions for computing a targeted bit allocation T forstorage in an encoder buffer; instructions for determining a thresholdfrom a desired occupancy level of a buffer model characterizing adecoder buffer status of the decoder; instructions for comparing anoccupancy status of the buffer model with the threshold; andinstructions for adaptively adjusting the targeted bit allocation T atleast partially in response to the comparison.
 26. A circuit foradjusting a targeted bit allocation T for a picture that is to beencoded in a real-time video encoder and transmitted to a decoder,comprising: means at an encoder for computing a targeted bit allocationT for encoding the picture and storing in an encoder buffer; means fordetermining a threshold from a desired occupancy level of a buffer modelcharacterizing a decoder buffer status of the decoder; means forcomparing an occupancy status of the buffer model with the threshold;means for adaptively adjusting the targeted bit allocation T at leastpartially in response to the comparison; and means for providing theadjusted targeted bit allocation T to the video encoding process.